Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - HBase table row key design question.


Copy link to this message
-
Re: HBase table row key design question.
Doug Meil 2012-10-02, 20:02

Hi there, while this isn't an answer to some of the specific design
questions, this chapter in the RefGuide can be helpful for general design..

http://hbase.apache.org/book.html#schema

On 10/2/12 10:28 AM, "Jason Huang" <[EMAIL PROTECTED]> wrote:

>Hello,
>
>I am designing a HBase table for users and hope to get some
>suggestions for my row key design. Thanks...
>
>This user table will have columns which include user information such
>as names, birthday, gender, address, phone number, etc... The first
>time user comes to us we will ask all these information and we should
>generate a new row in the table with a unique row key. The next time
>the same user comes in again we will ask for his/her names and
>birthday and our application should quickly get the row(s) in the
>table which meets the name and birthday provided.
>
>Here is what I am thinking as row key:
>
>{first 6 digit of user's first name}_{first 6 digit of user's last
>name}_{birthday in MMDDYYYY}_{timestamp when user comes in for the
>first time}
>
>However, I see a few questions from this row key:
>
>(1) Although it is not very likely but there could be some small
>chances that two users with same name and birthday came in at the same
>day. And the two requests to generate new user came at the same time
>(the timestamps were defined in the HTable API and happened to be of
>the same value before calling the put method). This means the row key
>design above won't guarantee a unique row key. Any suggestions on how
>to modify it and ensure a unique ID?
>
>(2) Sometimes we will only have part of user's first name and/or last
>name. In that case, we will need to perform a scan and return multiple
>matches to the client. To avoid scanning the whole table, if we have
>user's first name, we can set start/stop row accordingly. But then if
>we only have user's last name, we can't set up a good start/stop row.
>What's even worse, if the user provides a "sounds-like" first or last
>name, then our scan won't be able to return good possible matches.
>Does anyone ever use names as part of the row key and encounter this
>type of issue?
>
>(3) The row key seems to be long (30+ chars), will this affect our
>read/write performance? Maybe it will increase the storage a bit (say
>we have 3 million rows per month)? In other words, does the length of
>the row key matter a lot?
>
>thanks!
>
>Jason
>