Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> setTimeRange and setMaxVersions seem to be inefficient


Copy link to this message
-
Re: setTimeRange and setMaxVersions seem to be inefficient
Hi Lars:

Thanks for spending time discussing this with me. I appreciate it.

I tried to implement the setMaxVersions(1) inside the filter as follows:

@Override
public ReturnCode filterKeyValue(KeyValue kv) {

// check if the same qualifier as the one that has been included
previously. If yes, jump to next column
if (previousIncludedQualifier != null &&
Bytes.compareTo(previousIncludedQualifier,kv.getQualifier()) == 0) {
previousIncludedQualifier = null;
return ReturnCode.NEXT_COL;
}
        // another condition that makes the jump further using HINT
if (Bytes.compareTo(this.qualifier, kv.getQualifier()) == 0) {
LOG.info("Matched Found.");
return ReturnCode.SEEK_NEXT_USING_HINT;

}
        // include this to the result and keep track of the included
qualifier so the next version of the same qualifier will be excluded
previousIncludedQualifier = kv.getQualifier();
return ReturnCode.INCLUDE;
}

Does this look reasonable or there is a better way to achieve this? It
would be nice to have ReturnCode.INCLUDE_AND_NEXT_COL for this case though.

Best Regards,

Jerry
On Wed, Aug 29, 2012 at 2:09 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Hi Jerry,
>
> my answer will be the same again:
> Some folks will want the max versions set by the client to be before
> filters and some folks will want it to restrict the end result.
> It's not possible to have it both ways. Your filter needs to do the right
> thing.
>
>
> There's a lot of discussion around this in HBASE-5104.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Jerry Lam <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
> Sent: Tuesday, August 28, 2012 1:52 PM
> Subject: Re: setTimeRange and setMaxVersions seem to be inefficient
>
> Hi Lars:
>
> I see. Please refer to the inline comment below.
>
> Best Regards,
>
> Jerry
>
> On Tue, Aug 28, 2012 at 2:21 PM, lars hofhansl <[EMAIL PROTECTED]>
> wrote:
>
> > What I was saying was: It depends. :)
> >
> > First off, how do you get to 1000 versions? In 0.94++ older version are
> > pruned upon flush, so you need 333 flushes (assuming 3 versions on the
> CF)
> > to get 1000 versions.
> >
>
> I forgot that the default number of version to keep is 3. If this is what
> people use most of the time, yes you are right for this type of scenarios
> where the number of version per column to keep is small.
>
> By that time some compactions will have happened and you're back to close
> > to 3 versions (maybe 9, 12, or 15 or so, depending on how store files you
> > have).
> >
> > Now, if you have that many version because because you set VERSIONS=>1000
> > in your CF... Then imagine you have 100 columns with 1000 versions each.
> >
>
> Yes, imagine I set VERSIONS => Long.MAX_VALUE (i.e. I will manage the
> versioning myself)
>
> In your scenario below you'd do 100000 comparisons if the filter would be
> > evaluated after the version counting. But only 1100 with the current
> code.
> > (or at least in that ball park)
> >
>
> This is where I don't quite understand what you mean.
>
> if the framework counts the number of ReturnCode.INCLUDE and then stops
> feeding the KeyValue into the filterKeyValue method after it reaches the
> count specified in setMaxVersions (i.e. 1 for the case we discussed),
> should then be just 100 comparisons only (at most) instead of 1100
> comparisons? Maybe I don't understand how the current way is doing...
>
>
>
> >
> > The gist is: One can construct scenarios where one approach is better
> than
> > the other. Only one order is possible.
> > If you write a custom filter and you care about these things you should
> > use the seek hints.
> >
> > -- Lars
> >
> >
> > ----- Original Message -----
> > From: Jerry Lam <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
> > Cc:
> > Sent: Tuesday, August 28, 2012 7:17 AM
> > Subject: Re: setTimeRange and setMaxVersions seem to be inefficient
> >
> > Hi Lars:
> >
> > Thanks for the reply.