Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Re: row filter - binary comparator at certain range


Copy link to this message
-
Re: row filter - binary comparator at certain range
Michael Segel 2013-10-21, 11:36

Lets look at what you are trying to do...

You want to take data where the key is a timestamp (long datatype)
You append it to a salt value 1=10 or 0-9 your example doesn't say...

You have a couple of problems with your choice of a key...

First after your initial 10 splits, you will still end up with writing everything to the left side of the region.
This means that when the region splits... all writes will still be to the left leaving your regions still 1/2 the size that they could be with the with the exception of your last set of salted regions. In this case 10 which will grow and then split.

Is this a bad thing? Maybe yes, maybe no.

The issue is that you will then have to write 10 queries with a start key and a stop key to get the range of your timestamp.
That would work, however...

1) Justify why you want/need to use a timestamp as a key for the row.

I'd say tell us more about the use case and why the access pattern.

Salting is bad in that the salt is disassociated to the underlying key.
Taking the key's hash, truncating it and preprending (if this is an actual word) to the key gives you a random key where if you know the rowkey, you can hash it.

My suggestion is that you rethink your key...

On Oct 20, 2013, at 11:31 PM, Tony Duan <[EMAIL PROTECTED]> wrote:

> Alex Vasilenko <aa.vasilenko@...> writes:
>
>>
>> Lars,
>>
>> But how it will behave, when I have salt at the beginning of the key to
>> properly shard table across regions? Imagine row key of format
>> salt:timestamp and rows goes like this:
>> ...
>> 1:15
>> 1:16
>> 1:17
>> 1:23
>> 2:3
>> 2:5
>> 2:12
>> 2:15
>> 2:19
>> 2:25
>> ...
>>
>> And I want to find all rows, that has second part (timestamp) in range
>> 15-25. What startKey and endKey should be used?
>>
>> Alexandr Vasilenko
>> Web Developer
>> Skype:menterr
>> mob: +38097-611-45-99
>>
>> 2012/2/9 lars hofhansl <lhofhansl@...>
> Hi,
> Alexandr Vasilenko
> Have you ever resolved this issue?i am also facing this iusse.
> i also want implement this functionality.
> Imagine row key of format
> salt:timestamp and rows goes like this:
> ...
> 1:15
> 1:16
> 1:17
> 1:23
> 2:3
> 2:5
> 2:12
> 2:15
> 2:19
> 2:25
> ...
>
> And I want to find all rows, that has second part (timestamp) in range
> 15-25.
>
> Could you please tell me how you resolve this ?
> thanks  in advance.
>
>
> Tony duan
>
>

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental.
Use at your own risk.
Michael Segel
michael_segel (AT) hotmail.com