Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - DISCUSS : HFile V3 proposal for tags in 0.96


Copy link to this message
-
Re: DISCUSS : HFile V3 proposal for tags in 0.96
ramkrishna vasudevan 2013-07-19, 12:00
But am afraid that once the user switches to V3 with tags he cannot come
back to V2.  If this scenario is possible then we need to see a work around
for that?
Particularly in the case if the user has written the tags and tries to read
it back with V2 then it would not work.

If user switches to V3 but does not write any tags then if we go with the
option of making tags optional using the Fileinfo then atleast after the
compaction is done the Hfile could be read with the V2 reader also.  But i
don't think the user would intend to do this given the fact that he needs
tags for his usecase.

Regards
Ram
On Fri, Jul 19, 2013 at 5:21 PM, Anoop John <[EMAIL PROTECTED]> wrote:

> Jean
>         When V2 will be used there wont any extra bytes and so no overhead
> in write or read paths.
> When V3 is used, and there are no tags present at all, we will have extra
> bytes for writing tag length.  Trying to put tag length as VInt so that
> this will be 1 byte only.  Then using File infos we can avoid overhead.
>
> Say when all the KVs in a file are having tag length as zero( a filer
> trailer indicate this) , during read we can avoid the read and decode of
> teh tag length. Just skip one byte of tag length.
>
> Regarding avoiding the tag length (even the 1 byte fully)  maybe during
> compaction it should be possible. But whether really needed I am thinikng.
> User can select V3 when there is a need for Tags.
>
> -Anoop-
>
> On Fri, Jul 19, 2013 at 4:53 PM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]> wrote:
>
> > Thanks Ram.
> >
> > One last. Space wise. If I understand correctly, between V2 and V3, when
> > tags are de-activated, there will be only a 1 bit difference, so same
> > storage space used. If tags are activated but empty, is it going to be
> the
> > same thing? Or are we going to have all the tags overhead? Like can we
> have
> > a byte to say "no tags in that file" in addition to "tags are activated
> for
> > that file"?
> >
> > So 2 questions.
> >
> > 1) what the overhead on disk space from the tags.
> > 2) should we have a flag(bit) per file to say no tags even if activated
> to
> > limit this overhead and ket people activate it for futur uses?
> >
> > JMS
> > Le 2013-07-19 07:11, "ramkrishna vasudevan" <
> > [EMAIL PROTECTED]> a écrit :
> >
> > > >>Based on your details, I think it will be, but very minimal, or
> > > almost invisible, correct?
> > > Yes of course.
> > > Regarding migration, any file written with V2 would still be read with
> > > HFileReaderV2 and the new files will be written with V3.  So there
> should
> > > not be any problem here.  We are anyway testing these things to  make
> > sure
> > > we don't break anywhere.  Thanks Jean for the interest.
> > >
> > > @Stack
> > > I would write up on the changes foreseen for the Codec changes to
> support
> > > RPC and HFileV3.
> > > Discussing with Anoop, we have some benefits when the Tags are written
> as
> > > the byte array and when tags are in memory.  Anyway that i would write
> up
> > > in a seperate thread also considering the inputs on the current way the
> > > patch has been made.
> > >
> > > Regards
> > > Ram
> > >
> > >
> > > On Fri, Jul 19, 2013 at 4:32 PM, Jean-Marc Spaggiari <
> > > [EMAIL PROTECTED]> wrote:
> > >
> > > > Like Ted and St.Ack, I read all of this with a great interest and
> > > > everything looked good to me.
> > > >
> > > > My only concern will be performance wise.  Even if tags are disabled,
> > di
> > > > you forsee some performances impacts because everything will now need
> > to
> > > be
> > > > tag aware? Based on your details, I think it will be, but very
> minimal,
> > > or
> > > > almost invisible, correct?
> > > >
> > > > Also, for migrations from v2 to v3, if v3 is activated, that will be
> > > simply
> > > > done when HFilea will be written, correct? So not really any
> migration
> > > > process required?
> > > >
> > > > JM
> > > > Le 2013-07-19 01:13, "Stack" <[EMAIL PROTECTED]> a écrit :
> > > >