Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> setTimeRange and setMaxVersions seem to be inefficient


Copy link to this message
-
Re: setTimeRange and setMaxVersions seem to be inefficient
Hi Ted:

Sure, will do.
I also implement the reset method to set previousIncludedQualifier to null
for the next row to come.

Best Regards,

Jerry

On Wed, Aug 29, 2012 at 1:47 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> Jerry:
> Remember to also implement:
>
> +  @Override
> +  public KeyValue getNextKeyHint(KeyValue currentKV) {
>
> You can log a JIRA for supporting ReturnCode.INCLUDE_AND_NEXT_COL.
>
> Cheers
>
> On Wed, Aug 29, 2012 at 6:59 AM, Jerry Lam <[EMAIL PROTECTED]> wrote:
>
> > Hi Lars:
> >
> > Thanks for spending time discussing this with me. I appreciate it.
> >
> > I tried to implement the setMaxVersions(1) inside the filter as follows:
> >
> > @Override
> > public ReturnCode filterKeyValue(KeyValue kv) {
> >
> > // check if the same qualifier as the one that has been included
> > previously. If yes, jump to next column
> > if (previousIncludedQualifier != null &&
> > Bytes.compareTo(previousIncludedQualifier,kv.getQualifier()) == 0) {
> > previousIncludedQualifier = null;
> > return ReturnCode.NEXT_COL;
> > }
> >         // another condition that makes the jump further using HINT
> > if (Bytes.compareTo(this.qualifier, kv.getQualifier()) == 0) {
> > LOG.info("Matched Found.");
> > return ReturnCode.SEEK_NEXT_USING_HINT;
> >
> > }
> >         // include this to the result and keep track of the included
> > qualifier so the next version of the same qualifier will be excluded
> > previousIncludedQualifier = kv.getQualifier();
> > return ReturnCode.INCLUDE;
> > }
> >
> > Does this look reasonable or there is a better way to achieve this? It
> > would be nice to have ReturnCode.INCLUDE_AND_NEXT_COL for this case
> though.
> >
> > Best Regards,
> >
> > Jerry
> >
> >
> > On Wed, Aug 29, 2012 at 2:09 AM, lars hofhansl <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Hi Jerry,
> > >
> > > my answer will be the same again:
> > > Some folks will want the max versions set by the client to be before
> > > filters and some folks will want it to restrict the end result.
> > > It's not possible to have it both ways. Your filter needs to do the
> right
> > > thing.
> > >
> > >
> > > There's a lot of discussion around this in HBASE-5104.
> > >
> > >
> > > -- Lars
> > >
> > >
> > >
> > > ________________________________
> > >  From: Jerry Lam <[EMAIL PROTECTED]>
> > > To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
> > > Sent: Tuesday, August 28, 2012 1:52 PM
> > > Subject: Re: setTimeRange and setMaxVersions seem to be inefficient
> > >
> > > Hi Lars:
> > >
> > > I see. Please refer to the inline comment below.
> > >
> > > Best Regards,
> > >
> > > Jerry
> > >
> > > On Tue, Aug 28, 2012 at 2:21 PM, lars hofhansl <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > What I was saying was: It depends. :)
> > > >
> > > > First off, how do you get to 1000 versions? In 0.94++ older version
> are
> > > > pruned upon flush, so you need 333 flushes (assuming 3 versions on
> the
> > > CF)
> > > > to get 1000 versions.
> > > >
> > >
> > > I forgot that the default number of version to keep is 3. If this is
> what
> > > people use most of the time, yes you are right for this type of
> scenarios
> > > where the number of version per column to keep is small.
> > >
> > > By that time some compactions will have happened and you're back to
> close
> > > > to 3 versions (maybe 9, 12, or 15 or so, depending on how store files
> > you
> > > > have).
> > > >
> > > > Now, if you have that many version because because you set
> > VERSIONS=>1000
> > > > in your CF... Then imagine you have 100 columns with 1000 versions
> > each.
> > > >
> > >
> > > Yes, imagine I set VERSIONS => Long.MAX_VALUE (i.e. I will manage the
> > > versioning myself)
> > >
> > > In your scenario below you'd do 100000 comparisons if the filter would
> be
> > > > evaluated after the version counting. But only 1100 with the current
> > > code.
> > > > (or at least in that ball park)
> > > >
> > >
> > > This is where I don't quite understand what you mean.
> > >
> > > if the framework counts the number of ReturnCode.INCLUDE and then stops
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB