Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBASE - select distinct query against the rowkey


Copy link to this message
-
Re: HBASE - select distinct query against the rowkey
There is no concept of transaction in the NoSQL world.  At least not in HBase.

All writes are atomic. Note that you *could* hold a lock, however, not really a good idea for a client to hold a lock.

Don't know if its really a problem though...

HTH

-Mike

On Dec 20, 2012, at 10:08 AM, Shengjie Min <[EMAIL PROTECTED]> wrote:

> Thanks Michael,
>
>> Not sure why you have timestamp in the key... assuming that message id
> would be incremented therefore rows would be in time order anyways.
>
> I will need to do query like give me the message from timestamp1 to
> timestamp2.
>
>> You will want to use a separate table.
> That's what I thought as well. If i don't have a separated table, i will
> end up having table scanning. But how about the atomicity? If you write a
> record in, succeeded on one table failed on another? Hbase has no concept
> of transaction in this case.
>
> Shengjie
>
>
> On 20 December 2012 15:59, Michael Segel <[EMAIL PROTECTED]> wrote:
>
>> Not sure why you have timestamp in the key... assuming that message id
>> would be incremented therefore rows would be in time order anyways.
>>
>> But to answer your question...
>> You will want to use a separate table.
>>
>> In both instances you will end up doing a full table scan, however the
>> number of rows in a distinct user table would be much less than your user's
>> table.
>>
>>
>> HTH
>>
>> -Mike
>>
>> On Dec 20, 2012, at 8:55 AM, Shengjie Min <[EMAIL PROTECTED]> wrote:
>>
>>> I have a hbase table called "users", rowkey consists of three parts:
>>>
>>>  1. userid
>>>  2. messageid
>>>  3. timestamp
>>>
>>> rowkey looks like: ${userid}_${messageid}_${timestamp}
>>>
>>> Given I can hash the userid and make the length of the field fixed, is
>>> there anyway I can do a query like SQL query:
>>>
>>> select distinct(userid) from users
>>>
>>> If rowkey doesn't allow me to query like this, does that mean I need to
>>> create a separated table just contains all the user ids? I guess if I do
>>> something like that, it won't be atomic anymore when I insert a record
>> in,
>>> becoz I am dealing with two tables without transaction.
>>> --
>>> All the best,
>>> Shengjie Min
>>
>>
>
>
> --
> All the best,
> Shengjie Min
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB