Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Rowkey design question


Copy link to this message
-
Re: Rowkey design question
Mohammad Tariq 2013-02-21, 22:25
Another good point.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Fri, Feb 22, 2013 at 3:45 AM, Asaf Mesika <[EMAIL PROTECTED]> wrote:

> An easier way is to place one byte before the time stamp which is called a
> bucket. You can calculate it by using modulu on the time stamp by the
> number of buckets. We are now in the process of field testing it.
>
>
> On Tuesday, February 19, 2013, Paul van Hoven wrote:
>
> > Yeah it worked fine.
> >
> > But as I understand: If I prefix my row key with something like
> >
> > md5-hash + timestamp
> >
> > then the rowkeys are probably evenly distributed but how would I
> > perform then a scan restricted to a special time range?
> >
> >
> > 2013/2/19 Mohammad Tariq <[EMAIL PROTECTED] <javascript:;>>:
> > > No. before the timestamp. All the row keys which are identical go to
> the
> > > same region. This is the default Hbase behavior and is meant to make
> the
> > > performance better. But sometimes the machine gets overloaded with
> reads
> > > and writes because we get concentrated on that particular machine. For
> > > example timeseries data. So it's better to hash the keys in order to
> make
> > > them go to all the machines equally. HTH
> > >
> > > BTW, did that range query work??
> > >
> > > Warm Regards,
> > > Tariq
> > > https://mtariq.jux.com/
> > > cloudfront.blogspot.com
> > >
> > >
> > > On Tue, Feb 19, 2013 at 9:54 PM, Paul van Hoven <
> > > [EMAIL PROTECTED]> wrote:
> > >
> > >> Hey Tariq,
> > >>
> > >> thanks for your quick answer. I'm not sure if I got the idea in the
> > >> seond part of your answer. You mean if I use a timestamp as a rowkey I
> > >> should append a hash like this:
> > >>
> > >> 1357279200000+MD5HASH
> > >>
> > >> and then the data would be distributed more equally?
> > >>
> > >>
> > >> 2013/2/19 Mohammad Tariq <[EMAIL PROTECTED]>:
> > >> > Hello Paul,
> > >> >
> > >> >     Try this and see if it works :
> > >> >        scan.setStartRow(Bytes.toBytes(startDate.getTime() + ""));
> > >> >        scan.setStopRow(Bytes.toBytes(endDate.getTime() + 1 + ""));
> > >> >
> > >> > Also try not to use TS as the rowkey, as it may lead to RS
> > hotspotting.
> > >> > Just add a hash to your rowkeys so that data is distributed evenly
> on
> > all
> > >> > the RSs.
> > >> >
> > >> > Warm Regards,
> > >> > Tariq
> > >> > https://mtariq.jux.com/
> > >> > cloudfront.blogspot.com
> > >> >
> > >> >
> > >> > On Tue, Feb 19, 2013 at 9:41 PM, Paul van Hoven <
> > >> > [EMAIL PROTECTED]> wrote:
> > >> >
> > >> >> Hi,
> > >> >>
> > >> >> I'm currently playing with hbase. The design of the rowkey seems to
> > be
> > >> >> critical.
> > >> >>
> > >> >> The rowkey for a certain database table of mine is:
> > >> >>
> > >> >> timestamp+ipaddress
> > >> >>
> > >> >> It looks something like this when performing a scan on the table in
> > the
> > >> >> shell:
> > >> >> hbase(main):012:0> scan 'ToyDataTable'
> > >> >> ROW                                         COLUMN+CELL
> > >> >>  1357020000000+192.168.178.9                column=CF:SampleCol,
> > >> >> timestamp=1361288601717, value=Entry_1 = 2013-01-01 07:00:00
> > >> >>
> > >> >> Since I got several rows for different timestamps I'd like to tell
> a
> > >> >> scan to just a region of the table for example from 2013-01-07 to
> > >> >> 2013-01-09. Previously I only had a timestamp as the rowkey and I
> > >> >> could restrict the rowkey like that:
> > >> >>
> > >> >> SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd
> > >> HH:mm:ss");
> > >> >>                         Date startDate > formatter.parse("2013-01-07
> > >> >> 07:00:00");
> > >> >>                         Date endDate = formatter.parse("2013-01-10
> > >> >> 07:00:00");
> > >> >>
> > >> >>                         HTableInterface toyDataTable > > >> >> pool.getTable("ToyDataTable");
> > >> >>                         Scan scan = new Scan( Bytes.toBytes(
> > >> >> startDate.getTime() ),
> > >> >> Bytes.toBytes( endDate.getTime() ) );