Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Understanding scan behaviour


+
Mohit Anchlia 2013-03-28, 04:15
+
Ted Yu 2013-03-28, 04:22
+
ramkrishna vasudevan 2013-03-28, 04:23
+
ramkrishna vasudevan 2013-03-28, 04:23
+
Mohit Anchlia 2013-03-28, 14:38
+
Jean-Marc Spaggiari 2013-03-28, 14:53
+
Mohit Anchlia 2013-03-28, 15:17
+
Jean-Marc Spaggiari 2013-03-28, 15:26
+
Mohit Anchlia 2013-03-28, 16:02
+
Ted Yu 2013-03-28, 16:15
+
Mohit Anchlia 2013-03-28, 17:17
+
Ted Yu 2013-03-28, 17:23
+
Li, Min 2013-03-29, 05:48
+
ramkrishna vasudevan 2013-03-29, 06:20
Copy link to this message
-
Re: Understanding scan behaviour
James Taylor 2013-03-29, 07:44
Mohith,
Are you wanting to reduce the amount of data you're scanning and bring
down your query time when:
- you have a row key has a multi-part row key of a string and time value and
- you know the prefix of the string and a range of the time value?
That's possible (but not easy) to do with HBase using the filter's
ability to return a seek hint to jump to the next set of contiguous
rows. If the cardinality of your string value isn't too large, this
approach can make a pretty dramatic performance improvement.

You should take a look at Phoenix
(https://github.com/forcedotcom/phoenix), a SQL skin on top of HBase -
we just introduced the above optimization. You'd create your table like
this:

CREATE TABLE t1 (id VARCHAR not null, timestamp DATE not null CONSTRAINT
pk PRIMARY KEY (id, timestamp));

Then your query would look like this:

SELECT id, timestamp FROM t1 WHERE id LIKE 'abc%' AND timestamp > ? AND
timestamp < ?;

and you'd bind the ? using the regular JDBC PreparedStatement APIs.

Regards,
James
@JamesPlusPlus

On 03/28/2013 11:20 PM, ramkrishna vasudevan wrote:
> Mohith,
>
> It is always better to go with start row and end row if you are knowing
> what are they.
> Just add one byte more to the actual end row (inclusive row) and form the
> end key.  This will narrow down the search.
>
> Remeber the byte comparison is the way that HBase scans.
> Regards
> Ram
>
> On Fri, Mar 29, 2013 at 11:18 AM, Li, Min <[EMAIL PROTECTED]> wrote:
>
>> Hi, Mohit,
>>
>> Try using ENDROW. STARTROW&ENDROW is much faster than PrefixFilter.
>>
>> "+" ascii code is 43
>> "," ascii code is 44
>>
>> scan 'SESSIONID_TIMELINE', {LIMIT => 1,STARTROW => '++++', ENDROW=>'+++,'}
>>
>> Min
>>
>> -----Original Message-----
>> From: Mohit Anchlia [mailto:[EMAIL PROTECTED]]
>> Sent: Friday, March 29, 2013 1:18 AM
>> To: [EMAIL PROTECTED]
>> Subject: Re: Understanding scan behaviour
>>
>> Could the prefix filter lead to full tablescan? In other words is
>> PrefixFilter applied after fetching the rows?
>>
>> Another question I have is say I have row key abc and abd and I search for
>> row "abc", is it always guranteed to be the first key when returned from
>> scanned results? If so I can alway put a condition in the client app.
>>
>> On Thu, Mar 28, 2013 at 9:15 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>>
>>> Take a look at the following in
>>> hbase-server/src/main/ruby/shell/commands/scan.rb
>>> (trunk)
>>>
>>>    hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND
>>>      (QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter ( 123,
>>> 456))"}
>>>
>>> Cheers
>>>
>>> On Thu, Mar 28, 2013 at 9:02 AM, Mohit Anchlia <[EMAIL PROTECTED]
>>>> wrote:
>>>> I see then I misunderstood the behaviour. My keys are id + timestamp so
>>>> that I can do a range type search. So what I really want is to return a
>>> row
>>>> where id matches the prefix. Is there a way to do this without having
>> to
>>>> scan large amounts of data?
>>>>
>>>>
>>>>
>>>> On Thu, Mar 28, 2013 at 8:26 AM, Jean-Marc Spaggiari <
>>>> [EMAIL PROTECTED]> wrote:
>>>>
>>>>> Hi Mohit,
>>>>>
>>>>> "+" ascii code is 43
>>>>> "9" ascii code is 57.
>>>>>
>>>>> So "+9" is coming after "++". If you don't have any row with the
>> exact
>>>>> key "+++++", HBase will look for the first one after this one. And in
>>>>> your case, it's +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF.
>>>>>
>>>>> JM
>>>>>
>>>>> 2013/3/28 Mohit Anchlia <[EMAIL PROTECTED]>:
>>>>>> My understanding is that the row key would start with +++++ for
>>>> instance.
>>>>>> On Thu, Mar 28, 2013 at 7:53 AM, Jean-Marc Spaggiari <
>>>>>> [EMAIL PROTECTED]> wrote:
>>>>>>
>>>>>>> Hi Mohit,
>>>>>>>
>>>>>>> I see nothing wrong with the results below. What would I have
>>>> expected?
>>>>>>> JM
>>>>>>>
>>>>>>> 2013/3/28 Mohit Anchlia <[EMAIL PROTECTED]>:
>>>>>>>   > I am running 92.1 version and this is what happens.
>>>>>>>>
>>>>>>>> hbase(main):003:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1,
+
Mohit Anchlia 2013-03-29, 16:31
+
Asaf Mesika 2013-03-30, 13:55
+
Mohit Anchlia 2013-03-30, 15:25
+
Ted Yu 2013-03-30, 16:37