Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Using separator/delimiter in HBase rowkey?


Copy link to this message
-
Using separator/delimiter in HBase rowkey?
Hello,

I am trying to get some advice on pros/cons of using separator/delimiter as
part of HBase row key.

Currently one of our user activity tables has a rowkey design of
"UserID^TimeStamp" with a separator of "^". (UserID is a string that won't
include '^').

This is designed for the two common use cases in our system:
(1) If we come from a context where the UserID is known, we can do a scan
easily for all the user activities with a startRowKey and stopRowKey.
(2) If we come from a external networked table where the row key of this
user activity table is stored and can be retrieved as activityRowKey, then
we can use the following code to parse out the UserID and do the same scan
as in (1):

    String activityRowKeyStr = Bytes.toString(activityRowKey);
    String userId activityRowKeyStr.subString(activityRowKeyStr.indexOf("^")+1)

Then I can set startRowKey and stopRowKey for the scan based on userId.
Here we get benefit of having the User ID as part of the row key with the
separator (comparing to another solution that stores the userID as one of
the columns in the user activity table).

The reason I pick a separator after UserID is that sometimes we may not get
a fixed length string of the UserID value. At one point I actually thought
of using MD5 to hash the UserID and make it a fixed length, however, the
possibility of collision and possible overhead of applying the hash
function makes me pick the separator "^".

My question:
(1) I kind of make the argument that using a separator is kind of better
than using a MD5 hash value. Does that seem reasonable? Could you comments
on other pros and cons that I might miss (as the bases for my argument)?

(2) On using a separator/delimiter, besides the requirements that this
separator/delimiter shouldn't appear elsewhere in the rowkey, are there any
other requirements? Are there any special separator/delimiters that are
better/worse than the average ones?

thanks!

Jason
+
Shahab Yunus 2013-07-08, 15:17
+
Mike Axiak 2013-07-08, 15:14
+
Michael Segel 2013-07-08, 15:29
+
Ted Yu 2013-07-08, 15:40
+
Mike Axiak 2013-07-08, 15:36
+
Michael Segel 2013-07-08, 15:54
+
Mike Axiak 2013-07-08, 16:00
+
Michael Segel 2013-07-08, 16:25
+
Jason Huang 2013-07-09, 01:09
+
Ted Yu 2013-07-08, 15:58
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB