Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> how to randomize the primary key which is a timestamp


+
Weishung Chung 2011-01-10, 15:33
+
Friso van Vollenhoven 2011-01-10, 15:50
+
Chirstopher Tarnas 2011-01-10, 16:05
+
Matt Corgan 2011-01-10, 16:08
+
Weishung Chung 2011-01-10, 16:20
+
Ted Dunning 2011-01-10, 16:30
+
Matt Corgan 2011-01-10, 16:41
Copy link to this message
-
Re: how to randomize the primary key which is a timestamp
Thank you for your prompt response. I am a bit confused about the prefix.
If i were to use prefix for the timestamp key, when come to query time, how
should i specify the row key to search for? How do I know which prefix was
used for a certain timestamp and needs to be append to the timestamp for
querying?

On Mon, Jan 10, 2011 at 10:41 AM, Matt Corgan <[EMAIL PROTECTED]> wrote:

> You can put them all in the same table.  If you prefix the keys when
> written, use a prefix filter when querying.  I would choose a prefix window
> that's about 4 times the number of nodes.
>
>
> On Mon, Jan 10, 2011 at 11:30 AM, Ted Dunning <[EMAIL PROTECTED]>
> wrote:
>
> > If multiple tables have the same key distribution and count, then they
> will
> > have similar split points for their regions, but the locations of the
> > regions will be randomized.
> >
> > I wouldn't worry about this until you see evidence it is a problem.
> >
> > On Mon, Jan 10, 2011 at 8:20 AM, Weishung Chung <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Thank you for the replies.
> > > Most of the queries, (70%) will be for scanning a range of consecutive
> > > times, with some single timestamp query (30%)
> > > But there are multiple tables with the same range of timestamps, will
> all
> > > these same range of timestamps from multiple tables be stored on the
> same
> > > region server and if so, could it affect the performance of map reduce
> > jobs
> > > (operated on those tables with the same range of time periods) ? Would
> > > hotspotting defeat the purpose of map reduce?
> > >
> > > On Mon, Jan 10, 2011 at 10:08 AM, Matt Corgan <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > You can also add a random (or hashed) prefix to the beginning of the
> > key.
> > > >  If your prefix were one byte with values 0-63, you've divided the
> hot
> > > spot
> > > > into 64 smaller ones, which is better for writing.  The downside is
> > that
> > > if
> > > > you want to read a range of values, you will have to query all 64
> > > "shards"
> > > > and merge the sorted values.  You can choose whatever prefix size is
> > best
> > > > for your scenario.
> > > >
> > > >
> > > > On Mon, Jan 10, 2011 at 11:05 AM, Chirstopher Tarnas <[EMAIL PROTECTED]>
> > > > wrote:
> > > >
> > > > > Some options that I am aware of:
> > > > >
> > > > > reverse the byte order of the timestamp
> > > > > use UUIDs rather than a timestamp
> > > > > use hashing, this working really depends on your requirements
> > > > >
> > > > > On Mon, Jan 10, 2011 at 9:33 AM, Weishung Chung <
> [EMAIL PROTECTED]>
> > > > > wrote:
> > > > >
> > > > > > What is the good way to randomize the primary key which is a
> > > timestamp
> > > > in
> > > > > > HBase to avoid hotspotting?
> > > > > > Thank you so much :)
> > > > > >
> > > > >
> > > >
> > >
> >
>
+
Matt Corgan 2011-01-10, 17:04
+
Weishung Chung 2011-01-10, 17:42
+
Tost 2011-01-11, 00:18
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB