Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> how to randomize the primary key which is a timestamp


+
Weishung Chung 2011-01-10, 15:33
+
Friso van Vollenhoven 2011-01-10, 15:50
+
Chirstopher Tarnas 2011-01-10, 16:05
+
Matt Corgan 2011-01-10, 16:08
+
Weishung Chung 2011-01-10, 16:20
+
Ted Dunning 2011-01-10, 16:30
+
Matt Corgan 2011-01-10, 16:41
Copy link to this message
-
Re: how to randomize the primary key which is a timestamp
Thank you for your prompt response. I am a bit confused about the prefix.
If i were to use prefix for the timestamp key, when come to query time, how
should i specify the row key to search for? How do I know which prefix was
used for a certain timestamp and needs to be append to the timestamp for
querying?

On Mon, Jan 10, 2011 at 10:41 AM, Matt Corgan <[EMAIL PROTECTED]> wrote:

> You can put them all in the same table.  If you prefix the keys when
> written, use a prefix filter when querying.  I would choose a prefix window
> that's about 4 times the number of nodes.
>
>
> On Mon, Jan 10, 2011 at 11:30 AM, Ted Dunning <[EMAIL PROTECTED]>
> wrote:
>
> > If multiple tables have the same key distribution and count, then they
> will
> > have similar split points for their regions, but the locations of the
> > regions will be randomized.
> >
> > I wouldn't worry about this until you see evidence it is a problem.
> >
> > On Mon, Jan 10, 2011 at 8:20 AM, Weishung Chung <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Thank you for the replies.
> > > Most of the queries, (70%) will be for scanning a range of consecutive
> > > times, with some single timestamp query (30%)
> > > But there are multiple tables with the same range of timestamps, will
> all
> > > these same range of timestamps from multiple tables be stored on the
> same
> > > region server and if so, could it affect the performance of map reduce
> > jobs
> > > (operated on those tables with the same range of time periods) ? Would
> > > hotspotting defeat the purpose of map reduce?
> > >
> > > On Mon, Jan 10, 2011 at 10:08 AM, Matt Corgan <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > You can also add a random (or hashed) prefix to the beginning of the
> > key.
> > > >  If your prefix were one byte with values 0-63, you've divided the
> hot
> > > spot
> > > > into 64 smaller ones, which is better for writing.  The downside is
> > that
> > > if
> > > > you want to read a range of values, you will have to query all 64
> > > "shards"
> > > > and merge the sorted values.  You can choose whatever prefix size is
> > best
> > > > for your scenario.
> > > >
> > > >
> > > > On Mon, Jan 10, 2011 at 11:05 AM, Chirstopher Tarnas <[EMAIL PROTECTED]>
> > > > wrote:
> > > >
> > > > > Some options that I am aware of:
> > > > >
> > > > > reverse the byte order of the timestamp
> > > > > use UUIDs rather than a timestamp
> > > > > use hashing, this working really depends on your requirements
> > > > >
> > > > > On Mon, Jan 10, 2011 at 9:33 AM, Weishung Chung <
> [EMAIL PROTECTED]>
> > > > > wrote:
> > > > >
> > > > > > What is the good way to randomize the primary key which is a
> > > timestamp
> > > > in
> > > > > > HBase to avoid hotspotting?
> > > > > > Thank you so much :)
> > > > > >
> > > > >
> > > >
> > >
> >
>
+
Matt Corgan 2011-01-10, 17:04
+
Weishung Chung 2011-01-10, 17:42
+
Tost 2011-01-11, 00:18