Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> key design


Copy link to this message
-
Re: key design
That's true.Then there would be max. 86,400 records per day per userid.
That is about 100MB per day. I don't see much difference in both approaches
from the storage perspective.

On Wed, Oct 10, 2012 at 1:09 PM, Doug Meil <[EMAIL PROTECTED]>wrote:

> Hi there-
>
> Given the fact that the userid is in the lead position of the key in both
> approaches, I'm not sure that he'd have a region hotspotting problem
> because the userid should be able to offer some spread.
>
>
>
>
> On 10/10/12 12:55 PM, "Jerry Lam" <[EMAIL PROTECTED]> wrote:
>
> >Hi:
> >
> >So you are saying you have ~3TB of data stored per day?
> >
> >Using the second approach, all data for one day will go to only 1
> >regionserver no matter what you do because HBase doesn't split a single
> >row.
> >
> >Using the first approach, data will spread across regionservers but there
> >will be hotspotted to each regionserver during write since this is a
> >time-series problem.
> >
> >Best Regards,
> >
> >Jerry
> >
> >On Wed, Oct 10, 2012 at 11:24 AM, yutoo yanio <[EMAIL PROTECTED]>
> >wrote:
> >
> >> hi
> >> i have a question about key & column design.
> >> in my application we have 3,000,000,000 record in every day
> >> each record contain : user-id, "time stamp", content(max 1KB).
> >> we need to store records for one year, this means we will have about
> >> 1,000,000,000,000 after 1 year.
> >> we just search a user-id over rang of "time stamp"
> >> table can design in two way
> >> 1.key=userid-timestamp and column:=content
> >> 2.key=userid-yyyyMMdd and column:HHmmss=content
> >>
> >>
> >> in first design we have tall-narrow table but we have very very
> >>records, in
> >> second design we have flat-wide table.
> >> which of them have better performance?
> >>
> >> thanks.
> >>
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB