Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Secondary indexes suggestions


Copy link to this message
-
Re: Secondary indexes suggestions
Michael Segel 2012-08-14, 11:55
Ah... schema design...

Yes you have both options identified... but just to add a twist... in the column name, prepend the  (epoch - timestamp) to the message id. This will put the messages in reverse order.
The only drawback to this is that its theoretically possible to create a row which exceeds your region's size....
 
You could also do this if you use a composite key. (Hash the user_id  and then (epoch - timestamp) and then the message_id.

You are correct that you have to scan many rows. However by using a start scanner that has the user_id as the start key and then end key as the user_id + the first character after the separator key.

The only reason I would say to hash the key is so that you get a more even distribution of data across the cluster, but that's not really that important.
On Aug 14, 2012, at 6:44 AM, Lukáš Drbal <[EMAIL PROTECTED]> wrote:

> Hi,
>
> thanks a lot for all response.
>
> Otis: filter from your link are great, i'll check it in my tests.
>
> Michael: i understand what is secondary indexes, but still don't have
> idea about effective rowkey format. I'm ok with delay in creating
> secondary index and atomicity, we don't need "realitime" data.
>
>
> When i have 10 messages with ids 1, 8, 10, 255, ... from one user with
> id 88. I see here only 2 options for rowkey in sec. index:
>
> 1) composite rowkey like <userId><SEPARATOR><messageId>
> 2) use userId as rowkey and put messageId into cells
> Exists any other?
>
> When i use first method, i must scan over many rows. What about
> startRow for scanner? Can be this scan effective?
>
> Second method need many many cells and i don't need all in one time,
> so this is imho bad idea.
>
>
> --
> Save The World - http://www.worldcommunitygrid.org/
> http://www.worldcommunitygrid.org/stat/viewMemberInfo.do?userName=LesTR
>
> Lukas Drbal
>