Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> key design


Copy link to this message
-
Re: key design
Hi there-

Given the fact that the userid is in the lead position of the key in both
approaches, I'm not sure that he'd have a region hotspotting problem
because the userid should be able to offer some spread.
On 10/10/12 12:55 PM, "Jerry Lam" <[EMAIL PROTECTED]> wrote:

>Hi:
>
>So you are saying you have ~3TB of data stored per day?
>
>Using the second approach, all data for one day will go to only 1
>regionserver no matter what you do because HBase doesn't split a single
>row.
>
>Using the first approach, data will spread across regionservers but there
>will be hotspotted to each regionserver during write since this is a
>time-series problem.
>
>Best Regards,
>
>Jerry
>
>On Wed, Oct 10, 2012 at 11:24 AM, yutoo yanio <[EMAIL PROTECTED]>
>wrote:
>
>> hi
>> i have a question about key & column design.
>> in my application we have 3,000,000,000 record in every day
>> each record contain : user-id, "time stamp", content(max 1KB).
>> we need to store records for one year, this means we will have about
>> 1,000,000,000,000 after 1 year.
>> we just search a user-id over rang of "time stamp"
>> table can design in two way
>> 1.key=userid-timestamp and column:=content
>> 2.key=userid-yyyyMMdd and column:HHmmss=content
>>
>>
>> in first design we have tall-narrow table but we have very very
>>records, in
>> second design we have flat-wide table.
>> which of them have better performance?
>>
>> thanks.
>>