Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBASE - select distinct query against the rowkey


Copy link to this message
-
Re: HBASE - select distinct query against the rowkey
There is no concept of transaction in the NoSQL world.  At least not in HBase.

All writes are atomic. Note that you *could* hold a lock, however, not really a good idea for a client to hold a lock.

Don't know if its really a problem though...

HTH

-Mike

On Dec 20, 2012, at 10:08 AM, Shengjie Min <[EMAIL PROTECTED]> wrote:

> Thanks Michael,
>
>> Not sure why you have timestamp in the key... assuming that message id
> would be incremented therefore rows would be in time order anyways.
>
> I will need to do query like give me the message from timestamp1 to
> timestamp2.
>
>> You will want to use a separate table.
> That's what I thought as well. If i don't have a separated table, i will
> end up having table scanning. But how about the atomicity? If you write a
> record in, succeeded on one table failed on another? Hbase has no concept
> of transaction in this case.
>
> Shengjie
>
>
> On 20 December 2012 15:59, Michael Segel <[EMAIL PROTECTED]> wrote:
>
>> Not sure why you have timestamp in the key... assuming that message id
>> would be incremented therefore rows would be in time order anyways.
>>
>> But to answer your question...
>> You will want to use a separate table.
>>
>> In both instances you will end up doing a full table scan, however the
>> number of rows in a distinct user table would be much less than your user's
>> table.
>>
>>
>> HTH
>>
>> -Mike
>>
>> On Dec 20, 2012, at 8:55 AM, Shengjie Min <[EMAIL PROTECTED]> wrote:
>>
>>> I have a hbase table called "users", rowkey consists of three parts:
>>>
>>>  1. userid
>>>  2. messageid
>>>  3. timestamp
>>>
>>> rowkey looks like: ${userid}_${messageid}_${timestamp}
>>>
>>> Given I can hash the userid and make the length of the field fixed, is
>>> there anyway I can do a query like SQL query:
>>>
>>> select distinct(userid) from users
>>>
>>> If rowkey doesn't allow me to query like this, does that mean I need to
>>> create a separated table just contains all the user ids? I guess if I do
>>> something like that, it won't be atomic anymore when I insert a record
>> in,
>>> becoz I am dealing with two tables without transaction.
>>> --
>>> All the best,
>>> Shengjie Min
>>
>>
>
>
> --
> All the best,
> Shengjie Min