Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> how to randomize the primary key which is a timestamp


Copy link to this message
-
Re: how to randomize the primary key which is a timestamp
Thanks alot, this will get me started :D

On Mon, Jan 10, 2011 at 11:04 AM, Matt Corgan <[EMAIL PROTECTED]> wrote:

> You could have prefix = timestamp % 64.  Then for a single key lookup, you
> could calculate the prefix and query just one shard.  For a scan, you have
> to query all shards and merge the results.
>
>
> On Mon, Jan 10, 2011 at 11:56 AM, Weishung Chung <[EMAIL PROTECTED]>
> wrote:
>
> > Thank you for your prompt response. I am a bit confused about the prefix.
> > If i were to use prefix for the timestamp key, when come to query time,
> how
> > should i specify the row key to search for? How do I know which prefix
> was
> > used for a certain timestamp and needs to be append to the timestamp for
> > querying?
> >
> > On Mon, Jan 10, 2011 at 10:41 AM, Matt Corgan <[EMAIL PROTECTED]>
> wrote:
> >
> > > You can put them all in the same table.  If you prefix the keys when
> > > written, use a prefix filter when querying.  I would choose a prefix
> > window
> > > that's about 4 times the number of nodes.
> > >
> > >
> > > On Mon, Jan 10, 2011 at 11:30 AM, Ted Dunning <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > If multiple tables have the same key distribution and count, then
> they
> > > will
> > > > have similar split points for their regions, but the locations of the
> > > > regions will be randomized.
> > > >
> > > > I wouldn't worry about this until you see evidence it is a problem.
> > > >
> > > > On Mon, Jan 10, 2011 at 8:20 AM, Weishung Chung <[EMAIL PROTECTED]>
> > > > wrote:
> > > >
> > > > > Thank you for the replies.
> > > > > Most of the queries, (70%) will be for scanning a range of
> > consecutive
> > > > > times, with some single timestamp query (30%)
> > > > > But there are multiple tables with the same range of timestamps,
> will
> > > all
> > > > > these same range of timestamps from multiple tables be stored on
> the
> > > same
> > > > > region server and if so, could it affect the performance of map
> > reduce
> > > > jobs
> > > > > (operated on those tables with the same range of time periods) ?
> > Would
> > > > > hotspotting defeat the purpose of map reduce?
> > > > >
> > > > > On Mon, Jan 10, 2011 at 10:08 AM, Matt Corgan <[EMAIL PROTECTED]
> >
> > > > wrote:
> > > > >
> > > > > > You can also add a random (or hashed) prefix to the beginning of
> > the
> > > > key.
> > > > > >  If your prefix were one byte with values 0-63, you've divided
> the
> > > hot
> > > > > spot
> > > > > > into 64 smaller ones, which is better for writing.  The downside
> is
> > > > that
> > > > > if
> > > > > > you want to read a range of values, you will have to query all 64
> > > > > "shards"
> > > > > > and merge the sorted values.  You can choose whatever prefix size
> > is
> > > > best
> > > > > > for your scenario.
> > > > > >
> > > > > >
> > > > > > On Mon, Jan 10, 2011 at 11:05 AM, Chirstopher Tarnas <
> > [EMAIL PROTECTED]>
> > > > > > wrote:
> > > > > >
> > > > > > > Some options that I am aware of:
> > > > > > >
> > > > > > > reverse the byte order of the timestamp
> > > > > > > use UUIDs rather than a timestamp
> > > > > > > use hashing, this working really depends on your requirements
> > > > > > >
> > > > > > > On Mon, Jan 10, 2011 at 9:33 AM, Weishung Chung <
> > > [EMAIL PROTECTED]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > What is the good way to randomize the primary key which is a
> > > > > timestamp
> > > > > > in
> > > > > > > > HBase to avoid hotspotting?
> > > > > > > > Thank you so much :)
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB