Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> DISCUSS : HFile V3 proposal for tags in 0.96

Copy link to this message
Re: DISCUSS : HFile V3 proposal for tags in 0.96
What changes/differences that we would be introducing in the V3 format
would be (I will put down in words under subcategory)

To reduce the code duplicate we would subclass ReaderV3 and WriterV3 from
ReaderV2 and WriterV2 respectively.
No change in V2 and V3.

*KV serialization*
V2 no change
V3 would now serailize the tags also after the Value part before the

Introduces a new information into the trailer which can be used in V3 to
make tags optional.  Suppose take the case that user selects V3 but in one
CF there are no tags.  Then we would write the tag bytes while flushing but
during compaction using this header info we would just avoid writing tags
in the compacted files.  This would mean no impact on read performances
after the compaction has been completed.
V2 would code also tries to get this trailer info but this being null no
impact on any of the existing code.

*WriterV3 and ReaderV3*
Tries to handle the tags based on the meta data from the trailer info.  All
the apis like seekTo, next(), getKeyValue() are now able to handle tags
based on the flag passed during the construction of the Readers and
Writers.  We can be sure that for any instances of V2 the includeTags flag
would always be false.

Additonal arguments added to the apis in the interfaces related to
HFileDataBlockEncoders, BufferedDataBlockEncoders,
HFileDataBlockEncodingContext etc.  Again for V2 the new apis would still
behave the same way and there would be no impact for V2 based usecases.
The BuffereddataEncoder, being the base class for all encoders other than
PrefixTree would now be tag aware.

Trying to keep changes minimal here but would ensure that there are no
behaviourial changes while using PrefixTree with V2.

*KeyValue class*
Wil include changes to have a Tag class inside this.  Apis to identify tags
in a KV would be needed.  Util method changes also would be there.

For V2 based read/write flow the existing code path applies with no/minimal

Many testcases has to be changed to accomodate the api changes happening to
the internal interfaces.
I have listed down the changes at a high level, may be once you could see a
patch that would give more clarity. Let me know if further information
would be needed.

On Thu, Jul 18, 2013 at 11:25 PM, Jimmy Xiang <[EMAIL PROTECTED]> wrote:

> Can you share some more details about it?  A graph/chart/table showing the
> specific difference will be helpful.
> Thanks,
> Jimmy
> On Thu, Jul 18, 2013 at 10:23 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> > I have been following comments on HBASE-8496.
> >
> > I think introducing cell tagging through HFile v3 is acceptable.
> >
> > Looking forward to seeing your implementation.
> >
> > Cheers
> >
> > On Thu, Jul 18, 2013 at 10:14 AM, ramkrishna vasudevan <
> > [EMAIL PROTECTED]> wrote:
> >
> > > For the past couple of months, we have been working through various
> > > prototypes for supporting inline storage of tags in cells as persisted
> on
> > > disk. Our goals are to support optional use of tags with minimal
> changes
> > to
> > > core code while also avoiding performance impacts to users who do not
> use
> > > tags.
> > >
> > >  For background, refer to the comments in
> > >
> > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13708228&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13708228
> > >
> > > and
> > >
> > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653
> > >
> > >  We have iterated on a couple of prototypes that implement tag
> awareness
> > in
> > > DataBlockEncoders, later as a new type of Codec for Cells. This point
> is