Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - DISCUSS : HFile V3 proposal for tags in 0.96


Copy link to this message
-
Re: DISCUSS : HFile V3 proposal for tags in 0.96
Ted Yu 2013-07-19, 04:40
bq. V3 would now serailize the tags also after the Value part before the
memstoreTS

Any consideration that the tags are serialized before the memstoreTS
instead of after ?

bq. The BuffereddataEncoder, being the base class for all encoders other
than PrefixTree would now be tag aware.

When would PrefixTree be able to handle tags ?

When a new HFile is opened, would user be able to specify that there is no
tagging involved ? Put in another way, after this feature goes in, would
HFile V3 always be written ?

Thanks

On Thu, Jul 18, 2013 at 9:29 PM, ramkrishna vasudevan <
[EMAIL PROTECTED]> wrote:

> What changes/differences that we would be introducing in the V3 format
> would be (I will put down in words under subcategory)
>
> To reduce the code duplicate we would subclass ReaderV3 and WriterV3 from
> ReaderV2 and WriterV2 respectively.
> *HFileBlockFormat*
> *=============*
> No change in V2 and V3.
>
> *KV serialization*
> *============*
> V2 no change
> V3 would now serailize the tags also after the Value part before the
> memstoreTS
>
> *FixedFileTrailer*
> *===========*
> Introduces a new information into the trailer which can be used in V3 to
> make tags optional.  Suppose take the case that user selects V3 but in one
> CF there are no tags.  Then we would write the tag bytes while flushing but
> during compaction using this header info we would just avoid writing tags
> in the compacted files.  This would mean no impact on read performances
> after the compaction has been completed.
> V2 would code also tries to get this trailer info but this being null no
> impact on any of the existing code.
>
> *WriterV3 and ReaderV3*
> *=================*
> Tries to handle the tags based on the meta data from the trailer info.  All
> the apis like seekTo, next(), getKeyValue() are now able to handle tags
> based on the flag passed during the construction of the Readers and
> Writers.  We can be sure that for any instances of V2 the includeTags flag
> would always be false.
>
> *DataBlockEncoders*
> *==============*
> Additonal arguments added to the apis in the interfaces related to
> HFileDataBlockEncoders, BufferedDataBlockEncoders,
> HFileDataBlockEncodingContext etc.  Again for V2 the new apis would still
> behave the same way and there would be no impact for V2 based usecases.
> The BuffereddataEncoder, being the base class for all encoders other than
> PrefixTree would now be tag aware.
>
> *PrefixTreeEncoders*
> *==============*
> Trying to keep changes minimal here but would ensure that there are no
> behaviourial changes while using PrefixTree with V2.
>
> *KeyValue class*
> *===========*
> Wil include changes to have a Tag class inside this.  Apis to identify tags
> in a KV would be needed.  Util method changes also would be there.
>
> For V2 based read/write flow the existing code path applies with no/minimal
> changes.
>
> Many testcases has to be changed to accomodate the api changes happening to
> the internal interfaces.
> I have listed down the changes at a high level, may be once you could see a
> patch that would give more clarity. Let me know if further information
> would be needed.
>
> Regards
> Ram
>
>
> On Thu, Jul 18, 2013 at 11:25 PM, Jimmy Xiang <[EMAIL PROTECTED]> wrote:
>
> > Can you share some more details about it?  A graph/chart/table showing
> the
> > specific difference will be helpful.
> >
> > Thanks,
> > Jimmy
> >
> >
> > On Thu, Jul 18, 2013 at 10:23 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >
> > > I have been following comments on HBASE-8496.
> > >
> > > I think introducing cell tagging through HFile v3 is acceptable.
> > >
> > > Looking forward to seeing your implementation.
> > >
> > > Cheers
> > >
> > > On Thu, Jul 18, 2013 at 10:14 AM, ramkrishna vasudevan <
> > > [EMAIL PROTECTED]> wrote:
> > >
> > > > For the past couple of months, we have been working through various
> > > > prototypes for supporting inline storage of tags in cells as
> persisted
> > on
>