Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - DISCUSS : HFile V3 proposal for tags in 0.96


Copy link to this message
-
Re: DISCUSS : HFile V3 proposal for tags in 0.96
Ted Yu 2013-07-19, 05:00
bq. By default code will go with V2.

Good.

Looking forward to the patch.

On Thu, Jul 18, 2013 at 9:57 PM, ramkrishna vasudevan <
[EMAIL PROTECTED]> wrote:

> >>Any consideration that the tags are serialized before the memstoreTS
> instead of after ?
> The argument is basically simple like memstoreTS is optional and that comes
> only in HFile and not in KV.  The tags are as part of the current design
> comes after Value in the KV structure.  Hence the same would be better to
> be applied on HFiles also.
> >>When would PrefixTree be able to handle tags ?
> May be my stmt confused you.  Pls see the point on PrefixTreeEncoders in
> the previous mail.  I meant that as per the current design PrefixKey,
> DiffKey, FastDiff extend BufferedDataEncoders and hence
> BufferedDataEncoders are made tag aware.
>
> PrefixTreecodec has been handled separately to make it work with tags.
> >> Put in another way, after this feature goes in, would
> HFile V3 always be written ?
> By default code will go with V2. So when user says he needs V3 he would
> need to update the hfile.format.version to 3.  This would ensure that the
> system uses V3.
>
> Thanks Ted.
>
> Regards
> Ram
>
>
> On Fri, Jul 19, 2013 at 10:10 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > bq. V3 would now serailize the tags also after the Value part before the
> > memstoreTS
> >
> > Any consideration that the tags are serialized before the memstoreTS
> > instead of after ?
> >
> > bq. The BuffereddataEncoder, being the base class for all encoders other
> > than PrefixTree would now be tag aware.
> >
> > When would PrefixTree be able to handle tags ?
> >
> > When a new HFile is opened, would user be able to specify that there is
> no
> > tagging involved ? Put in another way, after this feature goes in, would
> > HFile V3 always be written ?
> >
> > Thanks
> >
> > On Thu, Jul 18, 2013 at 9:29 PM, ramkrishna vasudevan <
> > [EMAIL PROTECTED]> wrote:
> >
> > > What changes/differences that we would be introducing in the V3 format
> > > would be (I will put down in words under subcategory)
> > >
> > > To reduce the code duplicate we would subclass ReaderV3 and WriterV3
> from
> > > ReaderV2 and WriterV2 respectively.
> > > *HFileBlockFormat*
> > > *=============*
> > > No change in V2 and V3.
> > >
> > > *KV serialization*
> > > *============*
> > > V2 no change
> > > V3 would now serailize the tags also after the Value part before the
> > > memstoreTS
> > >
> > > *FixedFileTrailer*
> > > *===========*
> > > Introduces a new information into the trailer which can be used in V3
> to
> > > make tags optional.  Suppose take the case that user selects V3 but in
> > one
> > > CF there are no tags.  Then we would write the tag bytes while flushing
> > but
> > > during compaction using this header info we would just avoid writing
> tags
> > > in the compacted files.  This would mean no impact on read performances
> > > after the compaction has been completed.
> > > V2 would code also tries to get this trailer info but this being null
> no
> > > impact on any of the existing code.
> > >
> > > *WriterV3 and ReaderV3*
> > > *=================*
> > > Tries to handle the tags based on the meta data from the trailer info.
> >  All
> > > the apis like seekTo, next(), getKeyValue() are now able to handle tags
> > > based on the flag passed during the construction of the Readers and
> > > Writers.  We can be sure that for any instances of V2 the includeTags
> > flag
> > > would always be false.
> > >
> > > *DataBlockEncoders*
> > > *==============*
> > > Additonal arguments added to the apis in the interfaces related to
> > > HFileDataBlockEncoders, BufferedDataBlockEncoders,
> > > HFileDataBlockEncodingContext etc.  Again for V2 the new apis would
> still
> > > behave the same way and there would be no impact for V2 based usecases.
> > > The BuffereddataEncoder, being the base class for all encoders other
> than
> > > PrefixTree would now be tag aware.
> > >
> > > *PrefixTreeEncoders*