Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Using separator/delimiter in HBase rowkey?


Copy link to this message
-
Re: Using separator/delimiter in HBase rowkey?
Not saying this is a solution or better in anyway but just more food for
thought. Is there any maximum size limit for UserIds? You can pad also for
Users Ids of smaller length. You are using more space in this way though.
It can help in sorting as well.

Regards,
Shahab
On Mon, Jul 8, 2013 at 10:19 AM, Jason Huang <[EMAIL PROTECTED]> wrote:

> Hello,
>
> I am trying to get some advice on pros/cons of using separator/delimiter as
> part of HBase row key.
>
> Currently one of our user activity tables has a rowkey design of
> "UserID^TimeStamp" with a separator of "^". (UserID is a string that won't
> include '^').
>
> This is designed for the two common use cases in our system:
> (1) If we come from a context where the UserID is known, we can do a scan
> easily for all the user activities with a startRowKey and stopRowKey.
> (2) If we come from a external networked table where the row key of this
> user activity table is stored and can be retrieved as activityRowKey, then
> we can use the following code to parse out the UserID and do the same scan
> as in (1):
>
>     String activityRowKeyStr = Bytes.toString(activityRowKey);
>     String userId > activityRowKeyStr.subString(activityRowKeyStr.indexOf("^")+1)
>
> Then I can set startRowKey and stopRowKey for the scan based on userId.
> Here we get benefit of having the User ID as part of the row key with the
> separator (comparing to another solution that stores the userID as one of
> the columns in the user activity table).
>
> The reason I pick a separator after UserID is that sometimes we may not get
> a fixed length string of the UserID value. At one point I actually thought
> of using MD5 to hash the UserID and make it a fixed length, however, the
> possibility of collision and possible overhead of applying the hash
> function makes me pick the separator "^".
>
> My question:
> (1) I kind of make the argument that using a separator is kind of better
> than using a MD5 hash value. Does that seem reasonable? Could you comments
> on other pros and cons that I might miss (as the bases for my argument)?
>
> (2) On using a separator/delimiter, besides the requirements that this
> separator/delimiter shouldn't appear elsewhere in the rowkey, are there any
> other requirements? Are there any special separator/delimiters that are
> better/worse than the average ones?
>
> thanks!
>
> Jason
>