Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> RPC KeyValue encoding

Copy link to this message
Re: RPC KeyValue encoding
I think what I was proposing was different (but I might also misunderstand the discussion in the jira).

Currently when we serialize a KV we take its buffer (identified by buffer, offset, and length) and write it on the wire.
What I was thinking is to just be more selective about what we copy (rather than actually compressing stuff, which involves memory copies, etc).

i.e. we write the first KV. Then (as suggested by Gregory) we write three bits (in a single byte) to indicate whether the next KV will reuse the same Row/CF/Column.
So we'll get the overhead of one byte per KV. On the write side we'd have to do an extra compare of the Row/CF/Column as we go and write the extra byte.
On the read side we'd have to do some juggling to piece the KV back together but the overall memory operations should be the same (it would be nice to have a scatter/gather type implementation of KV to avoid that).
-- Lars

 From: Stack <[EMAIL PROTECTED]>
Sent: Tuesday, September 4, 2012 2:29 PM
Subject: Re: RPC KeyValue encoding
On Mon, Sep 3, 2012 at 11:24 AM, Matt Corgan <[EMAIL PROTECTED]> wrote:
> Different implementations will have different performance
> characteristics where some may be better for disk and others for RPC, but
> the overall intent is the same.

A while back, custom compression of KVs before putting data on the
wire was tried and abandoned because of the added latency.  Our fb
brethren have actually committed a patch to do rpc compression to
89-fb because of throughput improvements and how it helps with network
congestion.  Check out this issue for discussion,