Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase table row key design question.


Copy link to this message
-
Re: HBase table row key design question.
Hello Sir,

     Although we should always try to keep the rowkey length as less as
possible, but still a short key that doesn't help much in faster data
access is also of no use. So, it totally depends on that particular use
case. However, in your case, how about using "phone number" as the rowkey??
Since it is always unique, you will always get the correct result with much
shorter rowkey. It's just that in this case you will have to ask for the
user's phone number instead of name and DOB.

Regards,
    Mohammad Tariq

On Tue, Oct 2, 2012 at 7:58 PM, Jason Huang <[EMAIL PROTECTED]> wrote:

> Hello,
>
> I am designing a HBase table for users and hope to get some
> suggestions for my row key design. Thanks...
>
> This user table will have columns which include user information such
> as names, birthday, gender, address, phone number, etc... The first
> time user comes to us we will ask all these information and we should
> generate a new row in the table with a unique row key. The next time
> the same user comes in again we will ask for his/her names and
> birthday and our application should quickly get the row(s) in the
> table which meets the name and birthday provided.
>
> Here is what I am thinking as row key:
>
> {first 6 digit of user's first name}_{first 6 digit of user's last
> name}_{birthday in MMDDYYYY}_{timestamp when user comes in for the
> first time}
>
> However, I see a few questions from this row key:
>
> (1) Although it is not very likely but there could be some small
> chances that two users with same name and birthday came in at the same
> day. And the two requests to generate new user came at the same time
> (the timestamps were defined in the HTable API and happened to be of
> the same value before calling the put method). This means the row key
> design above won't guarantee a unique row key. Any suggestions on how
> to modify it and ensure a unique ID?
>
> (2) Sometimes we will only have part of user's first name and/or last
> name. In that case, we will need to perform a scan and return multiple
> matches to the client. To avoid scanning the whole table, if we have
> user's first name, we can set start/stop row accordingly. But then if
> we only have user's last name, we can't set up a good start/stop row.
> What's even worse, if the user provides a "sounds-like" first or last
> name, then our scan won't be able to return good possible matches.
> Does anyone ever use names as part of the row key and encounter this
> type of issue?
>
> (3) The row key seems to be long (30+ chars), will this affect our
> read/write performance? Maybe it will increase the storage a bit (say
> we have 3 million rows per month)? In other words, does the length of
> the row key matter a lot?
>
> thanks!
>
> Jason
>