Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - Cell Encoders and usage of Cell


Copy link to this message
-
Re: Cell Encoders and usage of Cell
Matt Corgan 2013-04-22, 00:36
I'm not 100% clear what you're asking Nick.  My understanding is that Cell
and KeyValue are identical with regards to the timestamp.  Timestamp is
part of the identity of the Cell/KeyValue, and each has 1 and only 1
timestamp from a logical perspective.

>From a physical/memory perspective, KeyValue is one implementation of Cell
where all fields are fully expanded into a single continuous byte[].  The
Cell interface adds the ability for a timestamp to be shared behind the
scenes to save memory.  In the case where there are 100 KeyValues in an RPC
result or disk block, the KeyValue implementation will require 800b of
memory, but the Cell interface will de-duplicate them and store as little
as ~8b for the whole RPC or disk block.
On Sun, Apr 21, 2013 at 5:08 PM, Nick Dimiduk <[EMAIL PROTECTED]> wrote:

> A related question. Can you clarify the distinction between a Cell and a
> KeyValue as pertains to the timestamp? That is, which of these two concepts
> carries the timestamp as a component of its coordinates? Does a Cell
> contain multiple KeyValue versions or does a KeyValue contain multiple Cell
> versions?
>
> In HBASE-7233, patch v9, I see KeyValue is replaced by Cell in the Get
> result, which implies to me that a Cell contains multiple KeyValue
> versions. I don't see the imported Cell.proto. Presumably that's the same
> Cell type defined in hbase.proto currently on trunk.
>
> Thanks,
> Nick
>
> On Sun, Apr 21, 2013 at 2:47 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:
>
> > fyi Ram - i started adding the Cell interface to the read path of the
> delta
> > encoders in HBASE-7323 <https://issues.apache.org/jira/browse/HBASE-7323
> >.
> >  It's one possible place to start working on it.
> >
> >
> > On Thu, Apr 18, 2013 at 8:19 PM, ramkrishna vasudevan <
> > [EMAIL PROTECTED]> wrote:
> >
> > > Thanks for your reply Stack.
> > > >I think so.  hfile APIs are about KVs.  Should be about Cell I'd
> think.
> > > Yes.  This is what i too think.
> > >
> > > >If you need the above, you are no doing Cell right I'd argue.  The
> very
> > > idea of Cell is a disconnect between how it is stored and Cell use.
> > >
> > > Yes Stack.  I understand this.  I am not introducing the getKeyOffset
> and
> > > getKeyLength over there.
> > > My questions were mainly because, if i have the current code  and i
> would
> > > want to introduce tags in it, where would i do it?
> > > So if i need tags to be introduced should i start changing the HFile
> > > formats also and only then i would be getting the tags to work?
> > > What do you think here?
> > >
> > > > I think the Cell
> > > Interface needs methods added to allow access to "labels".
> > > Yes.  You are right.
> > >
> > >
> > >
> > > On Fri, Apr 19, 2013 at 6:58 AM, Stack <[EMAIL PROTECTED]> wrote:
> > >
> > > > On Wed, Apr 17, 2013 at 10:16 AM, ramkrishna vasudevan <
> > > > [EMAIL PROTECTED]> wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > With the introduction of the new Cell Interface we are providing a
> > way
> > > > > where both the RPC usage of cell and the usage of Cell in HFile are
> > > > > unified.(abstracted)
> > > > >
> > > > > The current block encoder which encodes the kvs into hfile blocks
> > will
> > > be
> > > > > enhanced may be BlockEncode2 which will deal with Cell encoding and
> > the
> > > > > same will be written to HFile.
> > > > >
> > > > >
> > > > That is the idea.  Current block encoders are unusable for anything
> but
> > > > hfile with their presumption of a particular KeyValue serialization
> and
> > > >  with hfile context sprinkled throughout.
> > > >
> > > >
> > > >
> > > > > Does that mean that there are going to be changes to the HFile
> format
> > > > also?
> > > > >  Just to understand is my understanding here correct or not.
> > > > >
> > > > >
> > > > I think so.  hfile APIs are about KVs.  Should be about Cell I'd
> think.
> > > >
> > > >
> > > >
> > > > > Because as the Cell interface the row, family, qualifier all are
> > > treated