Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> setTimeRange and setMaxVersions seem to be inefficient


Copy link to this message
-
Re: setTimeRange and setMaxVersions seem to be inefficient
Hi Lars:

Thanks for spending time discussing this with me. I appreciate it.

I tried to implement the setMaxVersions(1) inside the filter as follows:

@Override
public ReturnCode filterKeyValue(KeyValue kv) {

// check if the same qualifier as the one that has been included
previously. If yes, jump to next column
if (previousIncludedQualifier != null &&
Bytes.compareTo(previousIncludedQualifier,kv.getQualifier()) == 0) {
previousIncludedQualifier = null;
return ReturnCode.NEXT_COL;
}
        // another condition that makes the jump further using HINT
if (Bytes.compareTo(this.qualifier, kv.getQualifier()) == 0) {
LOG.info("Matched Found.");
return ReturnCode.SEEK_NEXT_USING_HINT;

}
        // include this to the result and keep track of the included
qualifier so the next version of the same qualifier will be excluded
previousIncludedQualifier = kv.getQualifier();
return ReturnCode.INCLUDE;
}

Does this look reasonable or there is a better way to achieve this? It
would be nice to have ReturnCode.INCLUDE_AND_NEXT_COL for this case though.

Best Regards,

Jerry
On Wed, Aug 29, 2012 at 2:09 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Hi Jerry,
>
> my answer will be the same again:
> Some folks will want the max versions set by the client to be before
> filters and some folks will want it to restrict the end result.
> It's not possible to have it both ways. Your filter needs to do the right
> thing.
>
>
> There's a lot of discussion around this in HBASE-5104.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Jerry Lam <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
> Sent: Tuesday, August 28, 2012 1:52 PM
> Subject: Re: setTimeRange and setMaxVersions seem to be inefficient
>
> Hi Lars:
>
> I see. Please refer to the inline comment below.
>
> Best Regards,
>
> Jerry
>
> On Tue, Aug 28, 2012 at 2:21 PM, lars hofhansl <[EMAIL PROTECTED]>
> wrote:
>
> > What I was saying was: It depends. :)
> >
> > First off, how do you get to 1000 versions? In 0.94++ older version are
> > pruned upon flush, so you need 333 flushes (assuming 3 versions on the
> CF)
> > to get 1000 versions.
> >
>
> I forgot that the default number of version to keep is 3. If this is what
> people use most of the time, yes you are right for this type of scenarios
> where the number of version per column to keep is small.
>
> By that time some compactions will have happened and you're back to close
> > to 3 versions (maybe 9, 12, or 15 or so, depending on how store files you
> > have).
> >
> > Now, if you have that many version because because you set VERSIONS=>1000
> > in your CF... Then imagine you have 100 columns with 1000 versions each.
> >
>
> Yes, imagine I set VERSIONS => Long.MAX_VALUE (i.e. I will manage the
> versioning myself)
>
> In your scenario below you'd do 100000 comparisons if the filter would be
> > evaluated after the version counting. But only 1100 with the current
> code.
> > (or at least in that ball park)
> >
>
> This is where I don't quite understand what you mean.
>
> if the framework counts the number of ReturnCode.INCLUDE and then stops
> feeding the KeyValue into the filterKeyValue method after it reaches the
> count specified in setMaxVersions (i.e. 1 for the case we discussed),
> should then be just 100 comparisons only (at most) instead of 1100
> comparisons? Maybe I don't understand how the current way is doing...
>
>
>
> >
> > The gist is: One can construct scenarios where one approach is better
> than
> > the other. Only one order is possible.
> > If you write a custom filter and you care about these things you should
> > use the seek hints.
> >
> > -- Lars
> >
> >
> > ----- Original Message -----
> > From: Jerry Lam <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
> > Cc:
> > Sent: Tuesday, August 28, 2012 7:17 AM
> > Subject: Re: setTimeRange and setMaxVersions seem to be inefficient
> >
> > Hi Lars:
> >
> > Thanks for the reply.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB