Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> compressing values returned to scanner


+
ameet kini 2012-10-01, 19:03
+
Keith Turner 2012-10-02, 18:24
+
ameet kini 2012-10-02, 18:30
+
ameet kini 2012-10-02, 18:48
+
Keith Turner 2012-10-02, 20:34
Copy link to this message
-
Re: compressing values returned to scanner
On Tue, Oct 2, 2012 at 2:30 PM, ameet kini <[EMAIL PROTECTED]> wrote:
>> need to decompress  keys on the server side to compare them.  Also
>> iterators on the server side need the keys and values decompressed.
>
> keys, I understand, but why do values need to be decompressed if there were
> no user iterators installed on the server? Are there system iterators that
> look inside the value?

I do not think any of the default iterators look at the value.   You
could possibly compress the value and lazily decompress it as its
needed by iterators.  It seems like each value would need to be
compressed individually and you would not be able to compress groups
of values.  I say this because values need to be interleaved as they
are read from multiple files and ordered.  So you lose the ability to
pass back a group of compressed values w/o ever decompressing them.
Compressing each value separately may incur a lot of overhead for
smaller values.    For larger values it would be great.

Other than iterators, compressing values individually could all be
done at the client side with a wrapper around the APIs for reading a
writing.   Iterators that operate on a table w/ compressed values
could possibly extend an iterator that decompresses it when used.

>
> Ameet
>
> On Tue, Oct 2, 2012 at 2:24 PM, Keith Turner <[EMAIL PROTECTED]> wrote:
>>
>> On Mon, Oct 1, 2012 at 3:03 PM, ameet kini <[EMAIL PROTECTED]> wrote:
>> >
>> > My understanding of compression in Accumulo 1.4.1 is that it is on by
>> > default and that data is decompressed by the tablet server, so data on
>> > the
>> > wire between server/client is decompressed. Is there a way to shift the
>> > decompression from happening on the server to the client? I have a use
>> > case
>> > where each Value in my table is relatively large (~ 8MB) and I can
>> > benefit
>> > from compression over the wire. I don't have any server side iterators,
>> > so
>> > the values don't need to be decompressed by the tablet server. Also,
>> > each
>> > scan returns a few rows, so client-side decompression can be fast.
>> >
>> > The only way I can think of now is to disable compression on that table,
>> > and
>> > handle compression/decompression in the application. But if there is a
>> > way
>> > to do this in Accumulo, I'd prefer that.
>> >
>>
>> There are two levels of compression in Accumulo.  First redundant
>> parts of the key are not stored.  If the row in a key is the same as
>> the previous row, then its not stored again.   The same is done for
>> columns and time stamps.   After the relative encoding is done a block
>> of key values is then compressed with gzip.
>>
>> As data is read from an RFile, when the row of a key is the same as
>> the previous key it will just point to the previous keys row.  This is
>> carried forward over the wire.  As keys are transferred, duplicate
>> fields in the key are not transferred.
>>
>> As far as decompressing on the client side vs server side, the server
>> at least needs to decompress keys.  On the server side you usually
>> need to read from multiple sorted files and order the result.  So you
>> need to decompress  keys on the server side to compare them.  Also
>> iterators on the server side need the keys and values decompressed.
>>
>> > Thanks,
>> > Ameet
>
>
+
Marc Parisi 2012-10-01, 19:19
+
William Slacum 2012-10-01, 19:32
+
ameet kini 2012-10-01, 19:40
+
William Slacum 2012-10-01, 20:00
+
Marc Parisi 2012-10-01, 20:26
+
Marc Parisi 2012-10-01, 20:44
+
David Medinets 2012-10-02, 00:35
+
ameet kini 2012-10-01, 19:27
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB