Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Scanning half a key or value in HBase


Copy link to this message
-
Re: Scanning half a key or value in HBase
Andrey Stepachev 2010-08-24, 20:36
This is another case, because this is not a prefix scan. This is
inclusive scan.
In this case of course you should use some techniques, like indexing
or full scan with filter.
But this is not a real time solution for any noticeable collections of
found keys
(you can achive ~4ms per record, and for example for 1000 rows you get 4 sec).

2010/8/24 Michelan Arendse <[EMAIL PROTECTED]>:
> This works wonderfully have a look at the code cause it's still a bit slow and I need it to be lighting fast.

> IndexedTable table = new IndexedTable(_hbManager.getConfiguration(), Bytes.toBytes("Table"));
> ResultScanner scanner = table.getIndexedScanner("IndexId", null,  null, null, filter,
>                new byte[][] {Bytes.toBytes("Colum_Family:column1")});

Under the hood IndexedTable perform Get for each found row in index,
so you can't
achive very fast index scans. Only denormalization can help.

>
> -----Original Message-----
> From: Andrey Stepachev [mailto:[EMAIL PROTECTED]]
> Sent: 23 August 2010 09:11 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Scanning half a key or value in HBase
>
> If my table is huge do i get full scan?
> I you want to get good performance on random read
> you really need start and stop keys.
> PrefixFIlters are usable in compound filters. If you want
> only one range (like 123_*), you must use start/stop keys.
>
> 2010/8/23 Samuru Jackson <[EMAIL PROTECTED]>:
>> Hi,
>>
>> I do it this way:
>>
>> The variable searchValue is my Prefix like in your case 123 would be:
>>
>> searchValue = "123";
>>
>> PrefixFilter prefixFilter = new PrefixFilter(Bytes.toBytes(searchValue));
>> Scan scan = new Scan();
>> scan.addFamily(Bytes.toBytes(this.REF_FAM));
>> scan.setFilter(prefixFilter);
>> ResultScanner resultScanner = hBaseTable.getScanner(scan);
>>
>> Now you can iterate over the resultScanner.
>>
>> Is this what you were looking for?
>>
>> /SJ
>>
>>
>>
>>
>> On Mon, Aug 23, 2010 at 6:00 AM, Michelan Arendse <[EMAIL PROTECTED]>
>> wrote:
>>> Hi,
>>>
>>> Thanks for the responses but it's still not what I am really looking for.
>>>
>>> The row id looks something like: number_string so it would be 123_foo,
>> 123_foo2 123_foo3.
>>> So now I want to find all the foo's that are related to the first half of
>> the key which is "123".
>>>
>>> Also I can't add start row if I do not know where 123 starts. And I can't
>> search for the start row, as I need this to be very fast.
>>>
>>> Thanks.
>>>
>>>
>>> -----Original Message-----
>>> From: Ryan Rawson [mailto:[EMAIL PROTECTED]]
>>> Sent: 17 August 2010 09:01 PM
>>> To: [EMAIL PROTECTED]
>>> Subject: Re: Scanning half a key or value in HBase
>>>
>>> Hey,
>>>
>>> One thing to watch out for is ascii with separator variable length
>>> keys, you would think if your key structure was:
>>>
>>> foo:bar
>>>
>>> starting at 'foo' and ending at 'foo:' might give you only keys which
>>> start with 'foo:' but this doesn't work like that.  You also get keys
>>> like:
>>> foo123:bar
>>>
>>> you must start the scan at 'foo:' but you can't just end it at 'foo;'
>>> (since next(:) == ';' in ascii), this has to do with the ordering of
>>> ASCII, for a reference look at http://www.asciitable.com/
>>>
>>> The bug-free solution is to start your scan at 'foo:' and use a prefix
>>> filter set to 'foo:'.
>>>
>>> If you are scanning fixed-width keys, eg: binary conversions of longs,
>>> then the [start,start+1) solution works.
>>>
>>> On Tue, Aug 17, 2010 at 5:59 AM, Andrey Stepachev <[EMAIL PROTECTED]>
>> wrote:
>>>> Use scan where start key is <first_half_of_key> itself as bytearray, and
>>>> stop key is <first_half_of_key> with last byte in bytearray + 1.
>>>>
>>>> example
>>>> abc% should be scan(abc, abd)
>>>>
>>>> 2010/8/17 Michelan Arendse <[EMAIL PROTECTED]>:
>>>>> Hi
>>>>>
>>>>> I am not sure if this is possible in HBase. What I am trying to do is
>> scan on a HBase table with something similar to how SQL would do it.
>>>>> e.g. SELECT *
>>>>>         FROM <table>
>