-Re: querying data on the basis of timestamp
Ted Yu 2013-03-14, 23:03
What you are asking looks similar to this:
HBASE-5010 Filter HFiles based on TTL
It went into 0.94.0
On Thu, Mar 14, 2013 at 3:53 PM, Pankaj Gupta <[EMAIL PROTECTED]>wrote:
> I have a question regarding query performance for rows greater than a
> timestamp. The use case is this:
> I want to find all the rows in a key range that have changed after a
> certain timestamp and upto a certain timestamp, i.e. exactly using this
> SCAN api:
> Scan setTimeRange(long minStamp, long maxStamp)
> Get versions of columns only within the specified timestamp
> range, [minStamp, maxStamp)
> Would this query go through all the rows in the key range or is there an
> optimization that makes it faster.
> I ask because I read about such an optimization in the following paper:
> Here is the excerpt:
> "For data stored in HBase that is time-series or contains a specific,
> known timestamp, a special timestamp file selection algorithm
> was added. Since time moves forward and data is rarely inserted
> at a significantly later time than its timestamp, each HFile will
> generally contain values for a fixed range of time. This
> information is stored as metadata in each HFile and queries that
> ask for a specific timestamp or range of timestamps will check if
> the request intersects with the ranges of each file, skipping those
> which do not overlap. "
> This will work perfectly for my use case but I don't know if this
> optimization, or any other for this use case, exists in the Apache HBase.
> The version of Apache HBASE we are currently using is 0.92.1 but
> considering moving to 0.94.