Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> RPC KeyValue encoding


Copy link to this message
-
Re: RPC KeyValue encoding
RPC encoding would be really nice since there is sometimes significant wire
traffic that could be reduced many-fold.  I have a particular table that i
scan and stream to a gzipped output file on S3, and i've noticed that while
the app server's network input is 100Mbps, the gzipped output can be 2Mbps!

Finishing the PrefixTree has been slow because I've saved a couple tricky
issues to the end and am light on time.  i'll try to put it on reviewboard
monday despite a known bug.  It is built with some of the ideas you mention
in mind, Lars.  Take a look at the
Cell<https://github.com/hotpads/hbase/blob/prefix-tree/hbase-common/src/main/java/org/apache/hadoop/hbase/cell/Cell.java>
 and CellAppender<https://github.com/hotpads/hbase/blob/prefix-tree/hbase-common/src/main/java/org/apache/hadoop/hbase/cell/appender/CellAppender.java>
classes
and their comments.  The idea with the CellAppender is to stream cells into
it and periodically compile()/flush() into a byte[] which can be saved to
an HFile or (eventually) sent over the wire.  For example, in
HRegion.get(..), the CellAppender would replace the "ArrayList<KeyValue>
results" collection.

After introducing the Cell interface, the trick to extending the encoded
cells up the HBase stack will be to reduce the reliance on stand-alone
KeyValues.  We'll want things like the Filters and KeyValueHeap to be able
to operate on reused Cells without materializing them into full KeyValues.
 That means that something like StoreFileScanner.peek() will not work
because the scanner cannot maintain the state of the currrent and next
Cells at the same time.  See
CellCollator<https://github.com/hotpads/hbase/blob/prefix-tree/hbase-common/src/main/java/org/apache/hadoop/hbase/cell/collator/CellCollator.java>
for
a possible replacement for KeyValueHeap.  The good news is that this can be
done in stages without major disruptions to the code base.

Looking at PtDataBlockEncoderSeeker<https://github.com/hotpads/hbase/blob/prefix-tree/hbase-prefix-tree/src/main/java/org/apache/hbase/codec/prefixtree/PtDataBlockEncoderSeeker.java>,
this would mean transitioning from the getKeyValue() method that creates
and fills a new KeyValue every time it's called to the getCurrentCell()
method which returns a reference to a Cell buffer that is reused as the
scanner proceeds.  Modifying a reusable Cell buffer rather than rapidly
shooting off new KeyValues should drastically reduce byte[] copying and
garbage churn.

I wish I understood the protocol buffers more so I could comment
specifically on that.  The result sent to the client can possibly be a
plain old encoded data block (byte[]/ByteBuffer) with a similar header to
the one encoded blocks have on disk (2 byte DataBlockEncoding id).  The
client then uses the same
CellScanner<https://github.com/hotpads/hbase/blob/prefix-tree/hbase-common/src/main/java/org/apache/hadoop/hbase/cell/scanner/CellScanner.java>that
the server uses when reading blocks from the block cache.  A nice
side-effect of sending the client an encoded byte[] is that the java client
can run the same decoder that the server uses which should be tremendously
faster and more memory efficient than the current method of building a
pointer-heavy result map.  I had envisioned this kind of thing being baked
into ClientV2, but i guess it could be wrangled into the current one if
someone wanted.

food for thought... cheers,
Matt

ps - i'm travelling tomorrow so may be silent on email

On Sat, Sep 1, 2012 at 9:03 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> In 0.96 we changing the wire protocol to use protobufs.
>
> While we're at it, I am wondering whether we can optimize a few things:
>
>
> 1. A Put or Delete can send many KeyValues, all of which have the same row
> key and many will likely have the same column family.
> 2. Likewise a Scan result or Get is for a single row. Each KV will again
> will have the same row key and many will have the same column family.
> 3. The client and server do not need to share the same KV implementation