Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo, mail # user - compressing values returned to scanner


+
ameet kini 2012-10-01, 19:03
+
Keith Turner 2012-10-02, 18:24
+
ameet kini 2012-10-02, 18:30
+
ameet kini 2012-10-02, 18:48
Copy link to this message
-
Re: compressing values returned to scanner
Keith Turner 2012-10-02, 20:34
On Tue, Oct 2, 2012 at 2:48 PM, ameet kini <[EMAIL PROTECTED]> wrote:
> In re-reading your response, I may have overlooked one key point.
>
>>> columns and time stamps.   After the relative encoding is done a block
>>> of key values is then compressed with gzip.
>
> Are the keys+values compressed together as one block? If thats the
> case, I can see why its not possible to only decompress keys and leave
> values compressed.

yes, it currently compresses a sequence of key values into a single block.

>
> Also, I've switched to double compression as per previous posts and
> its working nicely. I see about 10-15% more compression over just
> application level Value compression.
>
> Thanks for your responses,
> Ameet
>
> On Tue, Oct 2, 2012 at 2:30 PM, ameet kini <[EMAIL PROTECTED]> wrote:
>>> need to decompress  keys on the server side to compare them.  Also
>>> iterators on the server side need the keys and values decompressed.
>>
>> keys, I understand, but why do values need to be decompressed if there were
>> no user iterators installed on the server? Are there system iterators that
>> look inside the value?
>>
>> Ameet
>>
>> On Tue, Oct 2, 2012 at 2:24 PM, Keith Turner <[EMAIL PROTECTED]> wrote:
>>>
>>> On Mon, Oct 1, 2012 at 3:03 PM, ameet kini <[EMAIL PROTECTED]> wrote:
>>> >
>>> > My understanding of compression in Accumulo 1.4.1 is that it is on by
>>> > default and that data is decompressed by the tablet server, so data on
>>> > the
>>> > wire between server/client is decompressed. Is there a way to shift the
>>> > decompression from happening on the server to the client? I have a use
>>> > case
>>> > where each Value in my table is relatively large (~ 8MB) and I can
>>> > benefit
>>> > from compression over the wire. I don't have any server side iterators,
>>> > so
>>> > the values don't need to be decompressed by the tablet server. Also,
>>> > each
>>> > scan returns a few rows, so client-side decompression can be fast.
>>> >
>>> > The only way I can think of now is to disable compression on that table,
>>> > and
>>> > handle compression/decompression in the application. But if there is a
>>> > way
>>> > to do this in Accumulo, I'd prefer that.
>>> >
>>>
>>> There are two levels of compression in Accumulo.  First redundant
>>> parts of the key are not stored.  If the row in a key is the same as
>>> the previous row, then its not stored again.   The same is done for
>>> columns and time stamps.   After the relative encoding is done a block
>>> of key values is then compressed with gzip.
>>>
>>> As data is read from an RFile, when the row of a key is the same as
>>> the previous key it will just point to the previous keys row.  This is
>>> carried forward over the wire.  As keys are transferred, duplicate
>>> fields in the key are not transferred.
>>>
>>> As far as decompressing on the client side vs server side, the server
>>> at least needs to decompress keys.  On the server side you usually
>>> need to read from multiple sorted files and order the result.  So you
>>> need to decompress  keys on the server side to compare them.  Also
>>> iterators on the server side need the keys and values decompressed.
>>>
>>> > Thanks,
>>> > Ameet
>>
>>
+
Keith Turner 2012-10-02, 18:55
+
Marc Parisi 2012-10-01, 19:19
+
William Slacum 2012-10-01, 19:32
+
ameet kini 2012-10-01, 19:40
+
William Slacum 2012-10-01, 20:00
+
Marc Parisi 2012-10-01, 20:26
+
Marc Parisi 2012-10-01, 20:44
+
David Medinets 2012-10-02, 00:35
+
ameet kini 2012-10-01, 19:27