Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> setTimeRange and setMaxVersions seem to be inefficient


Copy link to this message
-
Re: setTimeRange and setMaxVersions seem to be inefficient
Hi Lars:

Thanks for confirming the inefficiency of the implementation for this case. For my case, a column can have more than 10K versions, I need a quick way to stop the scan from digging the column once there is a match (ReturnCode.INCLUDE). It would be nice to have a ReturnCode that can notify the framework to stop and go to next column once the number of versions specify in setMaxVersions is met.

For now, I guess I have to hack it in the custom filter (I.e. I keep the count myself)? If you have a better way to achieve this, please share :)

Best Regards,

Jerry

Sent from my iPad (sorry for spelling mistakes)

On 2012-08-27, at 20:11, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Currently filters are evaluated before we do version counting.
>
> Here's a comment from ScanQueryMatcher.java:
>     /**
>      * Filters should be checked before checking column trackers. If we do
>      * otherwise, as was previously being done, ColumnTracker may increment its
>      * counter for even that KV which may be discarded later on by Filter. This
>      * would lead to incorrect results in certain cases.
>      */
>
>
> So this is by design. (Doesn't mean it's correct or desirable, though.)
>
> -- Lars
>
>
> ----- Original Message -----
> From: Jerry Lam <[EMAIL PROTECTED]>
> To: user <[EMAIL PROTECTED]>
> Cc:
> Sent: Monday, August 27, 2012 2:40 PM
> Subject: setTimeRange and setMaxVersions seem to be inefficient
>
> Hi HBase community:
>
> I tried to use setTimeRange and setMaxVersions to limit the number of KVs
> return per column. The behaviour is as I would expect that is
> setTimeRange(0, T + 1) and setMaxVersions(1) will give me ONE version of KV
> with timestamp that is less than or equal to T.
> However, I noticed that all versions of the KeyValue for a particular
> column are processed through a custom filter I implemented even though I
> specify setMaxVersions(1) and setTimeRange(0, T+1). I expected that if ONE
> KV of a particular column has ReturnCode.INCLUDE, the framework will jump
> to the next COL instead of iterating through all versions of the column.
>
> Can someone confirm me if this is the expected behaviour (iterating through
> all versions of a column before setMaxVersions take effect)? If this is an
> expected behaviour, what is your recommendation to speed this up?
>
> Best Regards,
>
> Jerry
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB