Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> RPC KeyValue encoding


Copy link to this message
-
Re: RPC KeyValue encoding
Thanks for the update, Matt.

w.r.t. Cell class, since it is so fundamental, should it reside in org.
apache.hadoop.hbase namespace as KeyValue class does ?
For CellAppender, is compile() equivalent to flushing ?

Looking forward to your publishing on the reviewboard.

On Sat, Sep 1, 2012 at 11:29 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:

> RPC encoding would be really nice since there is sometimes significant wire
> traffic that could be reduced many-fold.  I have a particular table that i
> scan and stream to a gzipped output file on S3, and i've noticed that while
> the app server's network input is 100Mbps, the gzipped output can be 2Mbps!
>
> Finishing the PrefixTree has been slow because I've saved a couple tricky
> issues to the end and am light on time.  i'll try to put it on reviewboard
> monday despite a known bug.  It is built with some of the ideas you mention
> in mind, Lars.  Take a look at the
> Cell<
> https://github.com/hotpads/hbase/blob/prefix-tree/hbase-common/src/main/java/org/apache/hadoop/hbase/cell/Cell.java
> >
>  and CellAppender<
> https://github.com/hotpads/hbase/blob/prefix-tree/hbase-common/src/main/java/org/apache/hadoop/hbase/cell/appender/CellAppender.java
> >
> classes
> and their comments.  The idea with the CellAppender is to stream cells into
> it and periodically compile()/flush() into a byte[] which can be saved to
> an HFile or (eventually) sent over the wire.  For example, in
> HRegion.get(..), the CellAppender would replace the "ArrayList<KeyValue>
> results" collection.
>
> After introducing the Cell interface, the trick to extending the encoded
> cells up the HBase stack will be to reduce the reliance on stand-alone
> KeyValues.  We'll want things like the Filters and KeyValueHeap to be able
> to operate on reused Cells without materializing them into full KeyValues.
>  That means that something like StoreFileScanner.peek() will not work
> because the scanner cannot maintain the state of the currrent and next
> Cells at the same time.  See
> CellCollator<
> https://github.com/hotpads/hbase/blob/prefix-tree/hbase-common/src/main/java/org/apache/hadoop/hbase/cell/collator/CellCollator.java
> >
> for
> a possible replacement for KeyValueHeap.  The good news is that this can be
> done in stages without major disruptions to the code base.
>
> Looking at PtDataBlockEncoderSeeker<
> https://github.com/hotpads/hbase/blob/prefix-tree/hbase-prefix-tree/src/main/java/org/apache/hbase/codec/prefixtree/PtDataBlockEncoderSeeker.java
> >,
> this would mean transitioning from the getKeyValue() method that creates
> and fills a new KeyValue every time it's called to the getCurrentCell()
> method which returns a reference to a Cell buffer that is reused as the
> scanner proceeds.  Modifying a reusable Cell buffer rather than rapidly
> shooting off new KeyValues should drastically reduce byte[] copying and
> garbage churn.
>
> I wish I understood the protocol buffers more so I could comment
> specifically on that.  The result sent to the client can possibly be a
> plain old encoded data block (byte[]/ByteBuffer) with a similar header to
> the one encoded blocks have on disk (2 byte DataBlockEncoding id).  The
> client then uses the same
> CellScanner<
> https://github.com/hotpads/hbase/blob/prefix-tree/hbase-common/src/main/java/org/apache/hadoop/hbase/cell/scanner/CellScanner.java
> >that
> the server uses when reading blocks from the block cache.  A nice
> side-effect of sending the client an encoded byte[] is that the java client
> can run the same decoder that the server uses which should be tremendously
> faster and more memory efficient than the current method of building a
> pointer-heavy result map.  I had envisioned this kind of thing being baked
> into ClientV2, but i guess it could be wrangled into the current one if
> someone wanted.
>
> food for thought... cheers,
> Matt
>
> ps - i'm travelling tomorrow so may be silent on email
>
> On Sat, Sep 1, 2012 at 9:03 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB