Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> RPC KeyValue encoding

Copy link to this message
Re: RPC KeyValue encoding
That reminds me of another thought to occurred to me while looking at ScanQueryMatcher.
I was marveling at the relative complexity of it (together with StoreScanner) - admittedly some of this is my fault (see HBASE-4536 and HBASE-4071)

It would be so much easier to read if we had proper iterator trees (at least at some places in the code), similar to how relational database express execution plans (using scanner and iterator interchangeably).

- StoreScanner would just read from HFiles and pass KVs up, in fact it might no longer be needed
- Filters can be expressed as an iterator over those KVs.
- Handing deleted KVs would be another iterator
- So would be the version handling/counting
- Scanner would not need to passed List<KeyValue> to accumulate KVs in, but simply return KVs as they are encountered.

RegionScanner and StoreScanner would retain the KeyValueHeap to mergesort their sub scanners.
The overall complexity would remain the same, but the parts would be isolated better.

Something like this:
RegionScanner -> HeapIt ->-> VersionCounterIt -> FilterIt -> TimeRangeIt-> DeletesIt -> StoreScanner -> HeapIt ->-> StoreFileIt -> ...
(Should rename some of the things)

All iterators would issue (re)seeks when possible.

The iterator interface would be something like <init>, KV next(), close(), seek(KV), reseek(KV). Many Iterators would be stateful.

Would probably be a major refactoring, and the devil would be in the details. We would need to be careful to keep the current performance (currently ScanQueryMatcher is efficient, because it does a bunch of steps at the same time).

Just "blue skying".

-- Lars
From: Matt Corgan <[EMAIL PROTECTED]>
Sent: Monday, September 3, 2012 11:24 AM
Subject: Re: RPC KeyValue encoding

> For CellAppender, is compile() equivalent to flushing ?

Yes.  I'll rename CellAppender to CellOutputStream.  The concept is very
similar to a GzipOutputStream where you write bytes to it and periodically
call flush() which spits out a compressed byte[] behind the scenes.  The
server would write Cells to a CellOutputStream, flush them to a byte[] and
send the byte[] to the client.  There could be a default encoding, and the
client could send a flag to override the default.

Greg, you mention omitting fields that are repeated from one KeyValue to
the next.  I think this is basically what the existing DataBlockEncoders
are doing for KeyValues stored on disk (see PrefixKeyDeltaEncoder for
example).  I'm thinking we can use the same encoders for encoding on the
wire.  Different implementations will have different performance
characteristics where some may be better for disk and others for RPC, but
the overall intent is the same.


On Sun, Sep 2, 2012 at 2:56 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Your "coarse grain" options is what I had in mind in my email. I love the
> option of not needing to get it all right in 0.96.
> You, Matt, and I could talk and work out the details and get it done.
> -- Lars
> ----- Original Message -----
> From: Gregory Chanan <[EMAIL PROTECTED]>
> Cc:
> Sent: Sunday, September 2, 2012 12:52 PM
> Subject: Re: RPC KeyValue encoding
> Lars,
> If we make the KeyValue wire format flexible enough I think we'll be able
> to tackle the KV as an interface work later.  Just throwing out some ideas
> here:
> We could have a byte at the front of each KV serialization format that
> gives various options in each bit e.g.
> Omits Rows / Omits Family / Omits Qualifier / Omits Timestamp / Omits Value
> / plus some extra bytes for compression options and extensions.  Then we
> just need to define where the KV gets its field if it is omitted, e.g. from
> the previous KV in the RPC that had that field filled in.  We sort of have
> this with the optional fields already, although I don't recall exactly how
> protobuf handles those (we'd probably have to do some small restructuring);