Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> compressing values returned to scanner


Copy link to this message
-
Re: compressing values returned to scanner
On Tue, Oct 2, 2012 at 2:48 PM, ameet kini <[EMAIL PROTECTED]> wrote:
> In re-reading your response, I may have overlooked one key point.
>
>>> columns and time stamps.   After the relative encoding is done a block
>>> of key values is then compressed with gzip.
>
> Are the keys+values compressed together as one block? If thats the
> case, I can see why its not possible to only decompress keys and leave
> values compressed.

yes, it currently compresses a sequence of key values into a single block.

>
> Also, I've switched to double compression as per previous posts and
> its working nicely. I see about 10-15% more compression over just
> application level Value compression.
>
> Thanks for your responses,
> Ameet
>
> On Tue, Oct 2, 2012 at 2:30 PM, ameet kini <[EMAIL PROTECTED]> wrote:
>>> need to decompress  keys on the server side to compare them.  Also
>>> iterators on the server side need the keys and values decompressed.
>>
>> keys, I understand, but why do values need to be decompressed if there were
>> no user iterators installed on the server? Are there system iterators that
>> look inside the value?
>>
>> Ameet
>>
>> On Tue, Oct 2, 2012 at 2:24 PM, Keith Turner <[EMAIL PROTECTED]> wrote:
>>>
>>> On Mon, Oct 1, 2012 at 3:03 PM, ameet kini <[EMAIL PROTECTED]> wrote:
>>> >
>>> > My understanding of compression in Accumulo 1.4.1 is that it is on by
>>> > default and that data is decompressed by the tablet server, so data on
>>> > the
>>> > wire between server/client is decompressed. Is there a way to shift the
>>> > decompression from happening on the server to the client? I have a use
>>> > case
>>> > where each Value in my table is relatively large (~ 8MB) and I can
>>> > benefit
>>> > from compression over the wire. I don't have any server side iterators,
>>> > so
>>> > the values don't need to be decompressed by the tablet server. Also,
>>> > each
>>> > scan returns a few rows, so client-side decompression can be fast.
>>> >
>>> > The only way I can think of now is to disable compression on that table,
>>> > and
>>> > handle compression/decompression in the application. But if there is a
>>> > way
>>> > to do this in Accumulo, I'd prefer that.
>>> >
>>>
>>> There are two levels of compression in Accumulo.  First redundant
>>> parts of the key are not stored.  If the row in a key is the same as
>>> the previous row, then its not stored again.   The same is done for
>>> columns and time stamps.   After the relative encoding is done a block
>>> of key values is then compressed with gzip.
>>>
>>> As data is read from an RFile, when the row of a key is the same as
>>> the previous key it will just point to the previous keys row.  This is
>>> carried forward over the wire.  As keys are transferred, duplicate
>>> fields in the key are not transferred.
>>>
>>> As far as decompressing on the client side vs server side, the server
>>> at least needs to decompress keys.  On the server side you usually
>>> need to read from multiple sorted files and order the result.  So you
>>> need to decompress  keys on the server side to compare them.  Also
>>> iterators on the server side need the keys and values decompressed.
>>>
>>> > Thanks,
>>> > Ameet
>>
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB