Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBASE - select distinct query against the rowkey


Copy link to this message
-
Re: HBASE - select distinct query against the rowkey
Not sure why you have timestamp in the key... assuming that message id would be incremented therefore rows would be in time order anyways.

But to answer your question...
You will want to use a separate table.

In both instances you will end up doing a full table scan, however the number of rows in a distinct user table would be much less than your user's table.
HTH

-Mike

On Dec 20, 2012, at 8:55 AM, Shengjie Min <[EMAIL PROTECTED]> wrote:

> I have a hbase table called "users", rowkey consists of three parts:
>
>   1. userid
>   2. messageid
>   3. timestamp
>
> rowkey looks like: ${userid}_${messageid}_${timestamp}
>
> Given I can hash the userid and make the length of the field fixed, is
> there anyway I can do a query like SQL query:
>
> select distinct(userid) from users
>
> If rowkey doesn't allow me to query like this, does that mean I need to
> create a separated table just contains all the user ids? I guess if I do
> something like that, it won't be atomic anymore when I insert a record in,
> becoz I am dealing with two tables without transaction.
> --
> All the best,
> Shengjie Min