Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> compressing values returned to scanner


+
ameet kini 2012-10-01, 19:03
+
Keith Turner 2012-10-02, 18:24
+
ameet kini 2012-10-02, 18:30
Copy link to this message
-
Re: compressing values returned to scanner
In re-reading your response, I may have overlooked one key point.

>> columns and time stamps.   After the relative encoding is done a block
>> of key values is then compressed with gzip.

Are the keys+values compressed together as one block? If thats the
case, I can see why its not possible to only decompress keys and leave
values compressed.

Also, I've switched to double compression as per previous posts and
its working nicely. I see about 10-15% more compression over just
application level Value compression.

Thanks for your responses,
Ameet

On Tue, Oct 2, 2012 at 2:30 PM, ameet kini <[EMAIL PROTECTED]> wrote:
>> need to decompress  keys on the server side to compare them.  Also
>> iterators on the server side need the keys and values decompressed.
>
> keys, I understand, but why do values need to be decompressed if there were
> no user iterators installed on the server? Are there system iterators that
> look inside the value?
>
> Ameet
>
> On Tue, Oct 2, 2012 at 2:24 PM, Keith Turner <[EMAIL PROTECTED]> wrote:
>>
>> On Mon, Oct 1, 2012 at 3:03 PM, ameet kini <[EMAIL PROTECTED]> wrote:
>> >
>> > My understanding of compression in Accumulo 1.4.1 is that it is on by
>> > default and that data is decompressed by the tablet server, so data on
>> > the
>> > wire between server/client is decompressed. Is there a way to shift the
>> > decompression from happening on the server to the client? I have a use
>> > case
>> > where each Value in my table is relatively large (~ 8MB) and I can
>> > benefit
>> > from compression over the wire. I don't have any server side iterators,
>> > so
>> > the values don't need to be decompressed by the tablet server. Also,
>> > each
>> > scan returns a few rows, so client-side decompression can be fast.
>> >
>> > The only way I can think of now is to disable compression on that table,
>> > and
>> > handle compression/decompression in the application. But if there is a
>> > way
>> > to do this in Accumulo, I'd prefer that.
>> >
>>
>> There are two levels of compression in Accumulo.  First redundant
>> parts of the key are not stored.  If the row in a key is the same as
>> the previous row, then its not stored again.   The same is done for
>> columns and time stamps.   After the relative encoding is done a block
>> of key values is then compressed with gzip.
>>
>> As data is read from an RFile, when the row of a key is the same as
>> the previous key it will just point to the previous keys row.  This is
>> carried forward over the wire.  As keys are transferred, duplicate
>> fields in the key are not transferred.
>>
>> As far as decompressing on the client side vs server side, the server
>> at least needs to decompress keys.  On the server side you usually
>> need to read from multiple sorted files and order the result.  So you
>> need to decompress  keys on the server side to compare them.  Also
>> iterators on the server side need the keys and values decompressed.
>>
>> > Thanks,
>> > Ameet
>
>
+
Keith Turner 2012-10-02, 20:34
+
Keith Turner 2012-10-02, 18:55
+
Marc Parisi 2012-10-01, 19:19
+
William Slacum 2012-10-01, 19:32
+
ameet kini 2012-10-01, 19:40
+
William Slacum 2012-10-01, 20:00
+
Marc Parisi 2012-10-01, 20:26
+
Marc Parisi 2012-10-01, 20:44
+
David Medinets 2012-10-02, 00:35
+
ameet kini 2012-10-01, 19:27