Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> RPC KeyValue encoding


Copy link to this message
-
Re: RPC KeyValue encoding
I created separate reviewboard requests for the hbase-common module and the
hbase-prefix-tree module.  First one has the Cell interface,
CellOutputStream, CellScanner, etc mentioned above.

hbase-common: https://reviews.apache.org/r/6897/
hbase-prefix-tree: https://reviews.apache.org/r/6898/

Will leave tests out for now.  They're on github.

On Mon, Sep 3, 2012 at 3:17 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> That reminds me of another thought to occurred to me while looking at
> ScanQueryMatcher.
> I was marveling at the relative complexity of it (together with
> StoreScanner) - admittedly some of this is my fault (see HBASE-4536 and
> HBASE-4071)
>
> It would be so much easier to read if we had proper iterator trees (at
> least at some places in the code), similar to how relational database
> express execution plans (using scanner and iterator interchangeably).
>
> Then:
> - StoreScanner would just read from HFiles and pass KVs up, in fact it
> might no longer be needed
> - Filters can be expressed as an iterator over those KVs.
> - Handing deleted KVs would be another iterator
> - So would be the version handling/counting
> - Scanner would not need to passed List<KeyValue> to accumulate KVs in,
> but simply return KVs as they are encountered.
>
> RegionScanner and StoreScanner would retain the KeyValueHeap to mergesort
> their sub scanners.
> The overall complexity would remain the same, but the parts would be
> isolated better.
>
> Something like this:
> RegionScanner -> HeapIt ->-> VersionCounterIt -> FilterIt -> TimeRangeIt->
> DeletesIt -> StoreScanner -> HeapIt ->-> StoreFileIt -> ...
> (Should rename some of the things)
>
> All iterators would issue (re)seeks when possible.
>
> The iterator interface would be something like <init>, KV next(), close(),
> seek(KV), reseek(KV). Many Iterators would be stateful.
>
> Would probably be a major refactoring, and the devil would be in the
> details. We would need to be careful to keep the current performance
> (currently ScanQueryMatcher is efficient, because it does a bunch of steps
> at the same time).
>
> Just "blue skying".
>
> -- Lars
>
>
> ________________________________
> From: Matt Corgan <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
> Sent: Monday, September 3, 2012 11:24 AM
> Subject: Re: RPC KeyValue encoding
>
> >
> > For CellAppender, is compile() equivalent to flushing ?
>
> Yes.  I'll rename CellAppender to CellOutputStream.  The concept is very
> similar to a GzipOutputStream where you write bytes to it and periodically
> call flush() which spits out a compressed byte[] behind the scenes.  The
> server would write Cells to a CellOutputStream, flush them to a byte[] and
> send the byte[] to the client.  There could be a default encoding, and the
> client could send a flag to override the default.
>
> Greg, you mention omitting fields that are repeated from one KeyValue to
> the next.  I think this is basically what the existing DataBlockEncoders
> are doing for KeyValues stored on disk (see PrefixKeyDeltaEncoder for
> example).  I'm thinking we can use the same encoders for encoding on the
> wire.  Different implementations will have different performance
> characteristics where some may be better for disk and others for RPC, but
> the overall intent is the same.
>
> Matt
>
> On Sun, Sep 2, 2012 at 2:56 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
> > Your "coarse grain" options is what I had in mind in my email. I love the
> > option of not needing to get it all right in 0.96.
> >
> > You, Matt, and I could talk and work out the details and get it done.
> >
> >
> > -- Lars
> >
> >
> > ----- Original Message -----
> > From: Gregory Chanan <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]
> > Cc:
> > Sent: Sunday, September 2, 2012 12:52 PM
> > Subject: Re: RPC KeyValue encoding
> >
> > Lars,
> >
> > If we make the KeyValue wire format flexible enough I think we'll be able
> > to tackle the KV as an interface work later.  Just throwing out some
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB