Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> compressing values returned to scanner


+
ameet kini 2012-10-01, 19:03
+
Keith Turner 2012-10-02, 18:24
Copy link to this message
-
Re: compressing values returned to scanner
> need to decompress  keys on the server side to compare them.  Also
> iterators on the server side need the keys and values decompressed.

keys, I understand, but why do values need to be decompressed if there were
no user iterators installed on the server? Are there system iterators that
look inside the value?

Ameet

On Tue, Oct 2, 2012 at 2:24 PM, Keith Turner <[EMAIL PROTECTED]> wrote:

> On Mon, Oct 1, 2012 at 3:03 PM, ameet kini <[EMAIL PROTECTED]> wrote:
> >
> > My understanding of compression in Accumulo 1.4.1 is that it is on by
> > default and that data is decompressed by the tablet server, so data on
> the
> > wire between server/client is decompressed. Is there a way to shift the
> > decompression from happening on the server to the client? I have a use
> case
> > where each Value in my table is relatively large (~ 8MB) and I can
> benefit
> > from compression over the wire. I don't have any server side iterators,
> so
> > the values don't need to be decompressed by the tablet server. Also, each
> > scan returns a few rows, so client-side decompression can be fast.
> >
> > The only way I can think of now is to disable compression on that table,
> and
> > handle compression/decompression in the application. But if there is a
> way
> > to do this in Accumulo, I'd prefer that.
> >
>
> There are two levels of compression in Accumulo.  First redundant
> parts of the key are not stored.  If the row in a key is the same as
> the previous row, then its not stored again.   The same is done for
> columns and time stamps.   After the relative encoding is done a block
> of key values is then compressed with gzip.
>
> As data is read from an RFile, when the row of a key is the same as
> the previous key it will just point to the previous keys row.  This is
> carried forward over the wire.  As keys are transferred, duplicate
> fields in the key are not transferred.
>
> As far as decompressing on the client side vs server side, the server
> at least needs to decompress keys.  On the server side you usually
> need to read from multiple sorted files and order the result.  So you
> need to decompress  keys on the server side to compare them.  Also
> iterators on the server side need the keys and values decompressed.
>
> > Thanks,
> > Ameet
>
+
ameet kini 2012-10-02, 18:48
+
Keith Turner 2012-10-02, 20:34
+
Keith Turner 2012-10-02, 18:55
+
Marc Parisi 2012-10-01, 19:19
+
William Slacum 2012-10-01, 19:32
+
ameet kini 2012-10-01, 19:40
+
William Slacum 2012-10-01, 20:00
+
Marc Parisi 2012-10-01, 20:26
+
Marc Parisi 2012-10-01, 20:44
+
David Medinets 2012-10-02, 00:35
+
ameet kini 2012-10-01, 19:27
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB