Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> setTimeRange and setMaxVersions seem to be inefficient


Copy link to this message
-
Re: setTimeRange and setMaxVersions seem to be inefficient
Hi Lars:

Thanks for confirming the inefficiency of the implementation for this case. For my case, a column can have more than 10K versions, I need a quick way to stop the scan from digging the column once there is a match (ReturnCode.INCLUDE). It would be nice to have a ReturnCode that can notify the framework to stop and go to next column once the number of versions specify in setMaxVersions is met.

For now, I guess I have to hack it in the custom filter (I.e. I keep the count myself)? If you have a better way to achieve this, please share :)

Best Regards,

Jerry

Sent from my iPad (sorry for spelling mistakes)

On 2012-08-27, at 20:11, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Currently filters are evaluated before we do version counting.
>
> Here's a comment from ScanQueryMatcher.java:
>     /**
>      * Filters should be checked before checking column trackers. If we do
>      * otherwise, as was previously being done, ColumnTracker may increment its
>      * counter for even that KV which may be discarded later on by Filter. This
>      * would lead to incorrect results in certain cases.
>      */
>
>
> So this is by design. (Doesn't mean it's correct or desirable, though.)
>
> -- Lars
>
>
> ----- Original Message -----
> From: Jerry Lam <[EMAIL PROTECTED]>
> To: user <[EMAIL PROTECTED]>
> Cc:
> Sent: Monday, August 27, 2012 2:40 PM
> Subject: setTimeRange and setMaxVersions seem to be inefficient
>
> Hi HBase community:
>
> I tried to use setTimeRange and setMaxVersions to limit the number of KVs
> return per column. The behaviour is as I would expect that is
> setTimeRange(0, T + 1) and setMaxVersions(1) will give me ONE version of KV
> with timestamp that is less than or equal to T.
> However, I noticed that all versions of the KeyValue for a particular
> column are processed through a custom filter I implemented even though I
> specify setMaxVersions(1) and setTimeRange(0, T+1). I expected that if ONE
> KV of a particular column has ReturnCode.INCLUDE, the framework will jump
> to the next COL instead of iterating through all versions of the column.
>
> Can someone confirm me if this is the expected behaviour (iterating through
> all versions of a column before setMaxVersions take effect)? If this is an
> expected behaviour, what is your recommendation to speed this up?
>
> Best Regards,
>
> Jerry
>