Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Re: Secondary indexes suggestions


+
lars hofhansl 2012-08-15, 00:08
+
Andrew Purtell 2012-08-15, 01:59
+
Michael Segel 2012-08-15, 02:38
+
Andrew Purtell 2012-08-15, 02:49
+
Michael Segel 2012-08-15, 03:01
+
lars hofhansl 2012-08-15, 03:57
+
Lukáš Drbal 2012-08-14, 11:44
Copy link to this message
-
Re: Secondary indexes suggestions
Ah... schema design...

Yes you have both options identified... but just to add a twist... in the column name, prepend the  (epoch - timestamp) to the message id. This will put the messages in reverse order.
The only drawback to this is that its theoretically possible to create a row which exceeds your region's size....
 
You could also do this if you use a composite key. (Hash the user_id  and then (epoch - timestamp) and then the message_id.

You are correct that you have to scan many rows. However by using a start scanner that has the user_id as the start key and then end key as the user_id + the first character after the separator key.

The only reason I would say to hash the key is so that you get a more even distribution of data across the cluster, but that's not really that important.
On Aug 14, 2012, at 6:44 AM, Lukáš Drbal <[EMAIL PROTECTED]> wrote:

> Hi,
>
> thanks a lot for all response.
>
> Otis: filter from your link are great, i'll check it in my tests.
>
> Michael: i understand what is secondary indexes, but still don't have
> idea about effective rowkey format. I'm ok with delay in creating
> secondary index and atomicity, we don't need "realitime" data.
>
>
> When i have 10 messages with ids 1, 8, 10, 255, ... from one user with
> id 88. I see here only 2 options for rowkey in sec. index:
>
> 1) composite rowkey like <userId><SEPARATOR><messageId>
> 2) use userId as rowkey and put messageId into cells
> Exists any other?
>
> When i use first method, i must scan over many rows. What about
> startRow for scanner? Can be this scan effective?
>
> Second method need many many cells and i don't need all in one time,
> so this is imho bad idea.
>
>
> --
> Save The World - http://www.worldcommunitygrid.org/
> http://www.worldcommunitygrid.org/stat/viewMemberInfo.do?userName=LesTR
>
> Lukas Drbal
>
+
Lukáš Drbal 2012-08-12, 12:45
+
Otis Gospodnetic 2012-08-13, 21:49
+
Michael Segel 2012-08-14, 00:28
+
lars hofhansl 2012-08-14, 00:42
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB