Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> setTimeRange and setMaxVersions seem to be inefficient


Copy link to this message
-
Re: setTimeRange and setMaxVersions seem to be inefficient
Hi Ted:

Sure, will do.
I also implement the reset method to set previousIncludedQualifier to null
for the next row to come.

Best Regards,

Jerry

On Wed, Aug 29, 2012 at 1:47 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> Jerry:
> Remember to also implement:
>
> +  @Override
> +  public KeyValue getNextKeyHint(KeyValue currentKV) {
>
> You can log a JIRA for supporting ReturnCode.INCLUDE_AND_NEXT_COL.
>
> Cheers
>
> On Wed, Aug 29, 2012 at 6:59 AM, Jerry Lam <[EMAIL PROTECTED]> wrote:
>
> > Hi Lars:
> >
> > Thanks for spending time discussing this with me. I appreciate it.
> >
> > I tried to implement the setMaxVersions(1) inside the filter as follows:
> >
> > @Override
> > public ReturnCode filterKeyValue(KeyValue kv) {
> >
> > // check if the same qualifier as the one that has been included
> > previously. If yes, jump to next column
> > if (previousIncludedQualifier != null &&
> > Bytes.compareTo(previousIncludedQualifier,kv.getQualifier()) == 0) {
> > previousIncludedQualifier = null;
> > return ReturnCode.NEXT_COL;
> > }
> >         // another condition that makes the jump further using HINT
> > if (Bytes.compareTo(this.qualifier, kv.getQualifier()) == 0) {
> > LOG.info("Matched Found.");
> > return ReturnCode.SEEK_NEXT_USING_HINT;
> >
> > }
> >         // include this to the result and keep track of the included
> > qualifier so the next version of the same qualifier will be excluded
> > previousIncludedQualifier = kv.getQualifier();
> > return ReturnCode.INCLUDE;
> > }
> >
> > Does this look reasonable or there is a better way to achieve this? It
> > would be nice to have ReturnCode.INCLUDE_AND_NEXT_COL for this case
> though.
> >
> > Best Regards,
> >
> > Jerry
> >
> >
> > On Wed, Aug 29, 2012 at 2:09 AM, lars hofhansl <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Hi Jerry,
> > >
> > > my answer will be the same again:
> > > Some folks will want the max versions set by the client to be before
> > > filters and some folks will want it to restrict the end result.
> > > It's not possible to have it both ways. Your filter needs to do the
> right
> > > thing.
> > >
> > >
> > > There's a lot of discussion around this in HBASE-5104.
> > >
> > >
> > > -- Lars
> > >
> > >
> > >
> > > ________________________________
> > >  From: Jerry Lam <[EMAIL PROTECTED]>
> > > To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
> > > Sent: Tuesday, August 28, 2012 1:52 PM
> > > Subject: Re: setTimeRange and setMaxVersions seem to be inefficient
> > >
> > > Hi Lars:
> > >
> > > I see. Please refer to the inline comment below.
> > >
> > > Best Regards,
> > >
> > > Jerry
> > >
> > > On Tue, Aug 28, 2012 at 2:21 PM, lars hofhansl <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > What I was saying was: It depends. :)
> > > >
> > > > First off, how do you get to 1000 versions? In 0.94++ older version
> are
> > > > pruned upon flush, so you need 333 flushes (assuming 3 versions on
> the
> > > CF)
> > > > to get 1000 versions.
> > > >
> > >
> > > I forgot that the default number of version to keep is 3. If this is
> what
> > > people use most of the time, yes you are right for this type of
> scenarios
> > > where the number of version per column to keep is small.
> > >
> > > By that time some compactions will have happened and you're back to
> close
> > > > to 3 versions (maybe 9, 12, or 15 or so, depending on how store files
> > you
> > > > have).
> > > >
> > > > Now, if you have that many version because because you set
> > VERSIONS=>1000
> > > > in your CF... Then imagine you have 100 columns with 1000 versions
> > each.
> > > >
> > >
> > > Yes, imagine I set VERSIONS => Long.MAX_VALUE (i.e. I will manage the
> > > versioning myself)
> > >
> > > In your scenario below you'd do 100000 comparisons if the filter would
> be
> > > > evaluated after the version counting. But only 1100 with the current
> > > code.
> > > > (or at least in that ball park)
> > > >
> > >
> > > This is where I don't quite understand what you mean.
> > >
> > > if the framework counts the number of ReturnCode.INCLUDE and then stops