Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - key design


Copy link to this message
-
Re: key design
Jerry Lam 2012-10-10, 19:08
That's true.Then there would be max. 86,400 records per day per userid.
That is about 100MB per day. I don't see much difference in both approaches
from the storage perspective.

On Wed, Oct 10, 2012 at 1:09 PM, Doug Meil <[EMAIL PROTECTED]>wrote:

> Hi there-
>
> Given the fact that the userid is in the lead position of the key in both
> approaches, I'm not sure that he'd have a region hotspotting problem
> because the userid should be able to offer some spread.
>
>
>
>
> On 10/10/12 12:55 PM, "Jerry Lam" <[EMAIL PROTECTED]> wrote:
>
> >Hi:
> >
> >So you are saying you have ~3TB of data stored per day?
> >
> >Using the second approach, all data for one day will go to only 1
> >regionserver no matter what you do because HBase doesn't split a single
> >row.
> >
> >Using the first approach, data will spread across regionservers but there
> >will be hotspotted to each regionserver during write since this is a
> >time-series problem.
> >
> >Best Regards,
> >
> >Jerry
> >
> >On Wed, Oct 10, 2012 at 11:24 AM, yutoo yanio <[EMAIL PROTECTED]>
> >wrote:
> >
> >> hi
> >> i have a question about key & column design.
> >> in my application we have 3,000,000,000 record in every day
> >> each record contain : user-id, "time stamp", content(max 1KB).
> >> we need to store records for one year, this means we will have about
> >> 1,000,000,000,000 after 1 year.
> >> we just search a user-id over rang of "time stamp"
> >> table can design in two way
> >> 1.key=userid-timestamp and column:=content
> >> 2.key=userid-yyyyMMdd and column:HHmmss=content
> >>
> >>
> >> in first design we have tall-narrow table but we have very very
> >>records, in
> >> second design we have flat-wide table.
> >> which of them have better performance?
> >>
> >> thanks.
> >>
>
>
>