Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - RPC KeyValue encoding


Copy link to this message
-
Re: RPC KeyValue encoding
Matt Corgan 2012-09-03, 18:24
>
> For CellAppender, is compile() equivalent to flushing ?

Yes.  I'll rename CellAppender to CellOutputStream.  The concept is very
similar to a GzipOutputStream where you write bytes to it and periodically
call flush() which spits out a compressed byte[] behind the scenes.  The
server would write Cells to a CellOutputStream, flush them to a byte[] and
send the byte[] to the client.  There could be a default encoding, and the
client could send a flag to override the default.

Greg, you mention omitting fields that are repeated from one KeyValue to
the next.  I think this is basically what the existing DataBlockEncoders
are doing for KeyValues stored on disk (see PrefixKeyDeltaEncoder for
example).  I'm thinking we can use the same encoders for encoding on the
wire.  Different implementations will have different performance
characteristics where some may be better for disk and others for RPC, but
the overall intent is the same.

Matt

On Sun, Sep 2, 2012 at 2:56 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Your "coarse grain" options is what I had in mind in my email. I love the
> option of not needing to get it all right in 0.96.
>
> You, Matt, and I could talk and work out the details and get it done.
>
>
> -- Lars
>
>
> ----- Original Message -----
> From: Gregory Chanan <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Cc:
> Sent: Sunday, September 2, 2012 12:52 PM
> Subject: Re: RPC KeyValue encoding
>
> Lars,
>
> If we make the KeyValue wire format flexible enough I think we'll be able
> to tackle the KV as an interface work later.  Just throwing out some ideas
> here:
>
> We could have a byte at the front of each KV serialization format that
> gives various options in each bit e.g.
> Omits Rows / Omits Family / Omits Qualifier / Omits Timestamp / Omits Value
> / plus some extra bytes for compression options and extensions.  Then we
> just need to define where the KV gets its field if it is omitted, e.g. from
> the previous KV in the RPC that had that field filled in.  We sort of have
> this with the optional fields already, although I don't recall exactly how
> protobuf handles those (we'd probably have to do some small restructuring);
> what's new is defining what it means when a field is omitted.
>
> There's some overhead with the above for small KVs, so you could also go
> coarser grain, e.g. the Get request/response could have a similar options
> byte like:
> All Share Same Row / All Share Same Family / ... / and one of the bits
> could turn on the finer grain options above (per KeyValue).
>
> The advantage of this is that all we'd have to get right in 0.96.0 is the
> deserialization.  The serialization could just send without any of the
> options turned on.  And we could experiment later with each specific RPC
> call what the best options to use are, as well as what storage to actually
> use client/server side, which you discuss.
>
> Greg
>
> On Sun, Sep 2, 2012 at 9:04 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > Thanks for the update, Matt.
> >
> > w.r.t. Cell class, since it is so fundamental, should it reside in org.
> > apache.hadoop.hbase namespace as KeyValue class does ?
> > For CellAppender, is compile() equivalent to flushing ?
> >
> > Looking forward to your publishing on the reviewboard.
> >
> > On Sat, Sep 1, 2012 at 11:29 PM, Matt Corgan <[EMAIL PROTECTED]>
> wrote:
> >
> > > RPC encoding would be really nice since there is sometimes significant
> > wire
> > > traffic that could be reduced many-fold.  I have a particular table
> that
> > i
> > > scan and stream to a gzipped output file on S3, and i've noticed that
> > while
> > > the app server's network input is 100Mbps, the gzipped output can be
> > 2Mbps!
> > >
> > > Finishing the PrefixTree has been slow because I've saved a couple
> tricky
> > > issues to the end and am light on time.  i'll try to put it on
> > reviewboard
> > > monday despite a known bug.  It is built with some of the ideas you
> > mention
> > > in mind, Lars.  Take a look at the