|
|
-
Query a version of a column efficiently
Jerry Lam 2012-07-26, 17:43
Hi HBase guru:
I need some advises on a problem that I'm facing using HBase. How can I efficiently query a version of a column when I don't know exactly the version I'm looking for? For instance, I want to query a column with timestamp that is less or equal to N, if version = N is available, return it to me. Otherwise, I want the version that is closest to the version N (order by descending of timestamp). That is if version = N - 1 exists, I want it to be returned.
I looked into the TimeRange query, it doesn't seem to provide this semantic naturally. Note that I don't know which version is closest to N so the setTimeRange(0,N+1). Do I need to implement a filter to do that or is it already available?
Any help will be appreciated.
Best Regards,
Jerry
-
Re: Query a version of a column efficiently
Stack 2012-07-26, 21:13
On Thu, Jul 26, 2012 at 7:43 PM, Jerry Lam <[EMAIL PROTECTED]> wrote: > I need some advises on a problem that I'm facing using HBase. How can I > efficiently query a version of a column when I don't know exactly the > version I'm looking for? > For instance, I want to query a column with timestamp that is less or equal > to N, if version = N is available, return it to me. Otherwise, I want the > version that is closest to the version N (order by descending of > timestamp). That is if version = N - 1 exists, I want it to be returned. >
Have you tried a timerange w/ minStamp of N and maxStamp of HConstants#LATEST_TIMESTAMP Long.MAX_VALUE) returning one version only (setMaxVersion(1))?
St.Ack
-
Re: Query a version of a column efficiently
Jerry Lam 2012-07-26, 21:40
Hi St.Ack:
Let say there are 5 versions for a column A with timestamp = [0, 1, 3, 6, 10]. I want to execute an efficient query that returns one version of the column that has a timestamp that is equal to 5 or less. So in this case, it should return the value of the column A with timestamp = 3.
Using the setTimeRange(5, Long.MAX_VALUE) with setMaxVersion = 1, my guess is that it will return the version 6 not version 3. Correct me if I'm wrong.
Best Regards,
Jerry
On Thu, Jul 26, 2012 at 5:13 PM, Stack <[EMAIL PROTECTED]> wrote:
> On Thu, Jul 26, 2012 at 7:43 PM, Jerry Lam <[EMAIL PROTECTED]> wrote: > > I need some advises on a problem that I'm facing using HBase. How can I > > efficiently query a version of a column when I don't know exactly the > > version I'm looking for? > > For instance, I want to query a column with timestamp that is less or > equal > > to N, if version = N is available, return it to me. Otherwise, I want the > > version that is closest to the version N (order by descending of > > timestamp). That is if version = N - 1 exists, I want it to be returned. > > > > Have you tried a timerange w/ minStamp of N and maxStamp of > HConstants#LATEST_TIMESTAMP Long.MAX_VALUE) returning one version only > (setMaxVersion(1))? > > St.Ack >
-
Re: Query a version of a column efficiently
Tom Brown 2012-07-26, 22:05
Somebody will correct me if I'm wrong, but I think that for your example, you should use setTimeRange(0, 5) and setMaxVersion(1). It's my understanding that those settings will give you the 1 latest version from all applicable version (0 <= timestamp <= 5).
Since it's pretty easy to set the timestamp of a row when you update it, try it, and see if it's what you want.
--Tom
On Thu, Jul 26, 2012 at 3:40 PM, Jerry Lam <[EMAIL PROTECTED]> wrote: > Hi St.Ack: > > Let say there are 5 versions for a column A with timestamp = [0, 1, 3, 6, > 10]. > I want to execute an efficient query that returns one version of the column > that has a timestamp that is equal to 5 or less. So in this case, it should > return the value of the column A with timestamp = 3. > > Using the setTimeRange(5, Long.MAX_VALUE) with setMaxVersion = 1, my guess > is that it will return the version 6 not version 3. Correct me if I'm > wrong. > > Best Regards, > > Jerry > > On Thu, Jul 26, 2012 at 5:13 PM, Stack <[EMAIL PROTECTED]> wrote: > >> On Thu, Jul 26, 2012 at 7:43 PM, Jerry Lam <[EMAIL PROTECTED]> wrote: >> > I need some advises on a problem that I'm facing using HBase. How can I >> > efficiently query a version of a column when I don't know exactly the >> > version I'm looking for? >> > For instance, I want to query a column with timestamp that is less or >> equal >> > to N, if version = N is available, return it to me. Otherwise, I want the >> > version that is closest to the version N (order by descending of >> > timestamp). That is if version = N - 1 exists, I want it to be returned. >> > >> >> Have you tried a timerange w/ minStamp of N and maxStamp of >> HConstants#LATEST_TIMESTAMP Long.MAX_VALUE) returning one version only >> (setMaxVersion(1))? >> >> St.Ack >>
-
Re: Query a version of a column efficiently
Stack 2012-07-26, 22:30
On Thu, Jul 26, 2012 at 11:40 PM, Jerry Lam <[EMAIL PROTECTED]> wrote: > Hi St.Ack: > > Let say there are 5 versions for a column A with timestamp = [0, 1, 3, 6, > 10]. > I want to execute an efficient query that returns one version of the column > that has a timestamp that is equal to 5 or less. So in this case, it should > return the value of the column A with timestamp = 3. > > Using the setTimeRange(5, Long.MAX_VALUE) with setMaxVersion = 1, my guess > is that it will return the version 6 not version 3. Correct me if I'm > wrong. >
What Tom says, try it. IIUC, it'll give you your 3. It won't give you 6 since that is outside of the timerange (try 0 instead of MAX_VALUE; I may have misled w/ MAX_VALUE... it might work but would have to check code).
St.Ack
-
Re: Query a version of a column efficiently
Jerry Lam 2012-07-26, 23:34
Hi St.Ack:
Can you tell me which source code is responsible for the logic. The source code in the get and scan doesnt provide an indication of how the setTimeRange works.
Best Regards,
Jerry
Sent from my iPad (sorry for spelling mistakes)
On 2012-07-26, at 18:30, Stack <[EMAIL PROTECTED]> wrote:
> On Thu, Jul 26, 2012 at 11:40 PM, Jerry Lam <[EMAIL PROTECTED]> wrote: >> Hi St.Ack: >> >> Let say there are 5 versions for a column A with timestamp = [0, 1, 3, 6, >> 10]. >> I want to execute an efficient query that returns one version of the column >> that has a timestamp that is equal to 5 or less. So in this case, it should >> return the value of the column A with timestamp = 3. >> >> Using the setTimeRange(5, Long.MAX_VALUE) with setMaxVersion = 1, my guess >> is that it will return the version 6 not version 3. Correct me if I'm >> wrong. >> > > What Tom says, try it. IIUC, it'll give you your 3. It won't give > you 6 since that is outside of the timerange (try 0 instead of > MAX_VALUE; I may have misled w/ MAX_VALUE... it might work but would > have to check code). > > St.Ack
-
Re: Query a version of a column efficiently
Suraj Varma 2012-07-30, 16:53
You may need to setup your Eclipse workspace and search using references etc.To get started, this is one class that uses TimeRange based matching ... org.apache.hadoop.hbase.regionserver.ScanQueryMatcher Also - Get is internally implemented as a Scan over a single row.
Hope this gets you started. --Suraj
On Thu, Jul 26, 2012 at 4:34 PM, Jerry Lam <[EMAIL PROTECTED]> wrote: > Hi St.Ack: > > Can you tell me which source code is responsible for the logic. The source code in the get and scan doesnt provide an indication of how the setTimeRange works. > > Best Regards, > > Jerry > > Sent from my iPad (sorry for spelling mistakes) > > On 2012-07-26, at 18:30, Stack <[EMAIL PROTECTED]> wrote: > >> On Thu, Jul 26, 2012 at 11:40 PM, Jerry Lam <[EMAIL PROTECTED]> wrote: >>> Hi St.Ack: >>> >>> Let say there are 5 versions for a column A with timestamp = [0, 1, 3, 6, >>> 10]. >>> I want to execute an efficient query that returns one version of the column >>> that has a timestamp that is equal to 5 or less. So in this case, it should >>> return the value of the column A with timestamp = 3. >>> >>> Using the setTimeRange(5, Long.MAX_VALUE) with setMaxVersion = 1, my guess >>> is that it will return the version 6 not version 3. Correct me if I'm >>> wrong. >>> >> >> What Tom says, try it. IIUC, it'll give you your 3. It won't give >> you 6 since that is outside of the timerange (try 0 instead of >> MAX_VALUE; I may have misled w/ MAX_VALUE... it might work but would >> have to check code). >> >> St.Ack
-
Re: Query a version of a column efficiently
Jerry Lam 2012-08-01, 21:41
Thanks Suraj. I looked at the code but it looks like the logic is not self-contained, particularly for the way hbase works with search for a specific version using TimeRange.
Best Regards,
Jerry
On Mon, Jul 30, 2012 at 12:53 PM, Suraj Varma <[EMAIL PROTECTED]> wrote:
> You may need to setup your Eclipse workspace and search using > references etc.To get started, this is one class that uses TimeRange > based matching ... > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher > Also - Get is internally implemented as a Scan over a single row. > > Hope this gets you started. > --Suraj > > On Thu, Jul 26, 2012 at 4:34 PM, Jerry Lam <[EMAIL PROTECTED]> wrote: > > Hi St.Ack: > > > > Can you tell me which source code is responsible for the logic. The > source code in the get and scan doesnt provide an indication of how the > setTimeRange works. > > > > Best Regards, > > > > Jerry > > > > Sent from my iPad (sorry for spelling mistakes) > > > > On 2012-07-26, at 18:30, Stack <[EMAIL PROTECTED]> wrote: > > > >> On Thu, Jul 26, 2012 at 11:40 PM, Jerry Lam <[EMAIL PROTECTED]> > wrote: > >>> Hi St.Ack: > >>> > >>> Let say there are 5 versions for a column A with timestamp = [0, 1, 3, > 6, > >>> 10]. > >>> I want to execute an efficient query that returns one version of the > column > >>> that has a timestamp that is equal to 5 or less. So in this case, it > should > >>> return the value of the column A with timestamp = 3. > >>> > >>> Using the setTimeRange(5, Long.MAX_VALUE) with setMaxVersion = 1, my > guess > >>> is that it will return the version 6 not version 3. Correct me if I'm > >>> wrong. > >>> > >> > >> What Tom says, try it. IIUC, it'll give you your 3. It won't give > >> you 6 since that is outside of the timerange (try 0 instead of > >> MAX_VALUE; I may have misled w/ MAX_VALUE... it might work but would > >> have to check code). > >> > >> St.Ack >
|
|