Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> key design


Copy link to this message
-
Re: key design
Hi:

So you are saying you have ~3TB of data stored per day?

Using the second approach, all data for one day will go to only 1
regionserver no matter what you do because HBase doesn't split a single
row.

Using the first approach, data will spread across regionservers but there
will be hotspotted to each regionserver during write since this is a
time-series problem.

Best Regards,

Jerry

On Wed, Oct 10, 2012 at 11:24 AM, yutoo yanio <[EMAIL PROTECTED]> wrote:

> hi
> i have a question about key & column design.
> in my application we have 3,000,000,000 record in every day
> each record contain : user-id, "time stamp", content(max 1KB).
> we need to store records for one year, this means we will have about
> 1,000,000,000,000 after 1 year.
> we just search a user-id over rang of "time stamp"
> table can design in two way
> 1.key=userid-timestamp and column:=content
> 2.key=userid-yyyyMMdd and column:HHmmss=content
>
>
> in first design we have tall-narrow table but we have very very records, in
> second design we have flat-wide table.
> which of them have better performance?
>
> thanks.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB