Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Rowkey design question


+
Paul van Hoven 2013-02-19, 16:11
+
Mohammad Tariq 2013-02-19, 16:16
+
Paul van Hoven 2013-02-19, 16:24
+
Mohammad Tariq 2013-02-19, 17:34
+
Paul van Hoven 2013-02-19, 17:50
+
Mohammad Tariq 2013-02-19, 17:54
Copy link to this message
-
Re: Rowkey design question
An easier way is to place one byte before the time stamp which is called a
bucket. You can calculate it by using modulu on the time stamp by the
number of buckets. We are now in the process of field testing it.
On Tuesday, February 19, 2013, Paul van Hoven wrote:

> Yeah it worked fine.
>
> But as I understand: If I prefix my row key with something like
>
> md5-hash + timestamp
>
> then the rowkeys are probably evenly distributed but how would I
> perform then a scan restricted to a special time range?
>
>
> 2013/2/19 Mohammad Tariq <[EMAIL PROTECTED] <javascript:;>>:
> > No. before the timestamp. All the row keys which are identical go to the
> > same region. This is the default Hbase behavior and is meant to make the
> > performance better. But sometimes the machine gets overloaded with reads
> > and writes because we get concentrated on that particular machine. For
> > example timeseries data. So it's better to hash the keys in order to make
> > them go to all the machines equally. HTH
> >
> > BTW, did that range query work??
> >
> > Warm Regards,
> > Tariq
> > https://mtariq.jux.com/
> > cloudfront.blogspot.com
> >
> >
> > On Tue, Feb 19, 2013 at 9:54 PM, Paul van Hoven <
> > [EMAIL PROTECTED]> wrote:
> >
> >> Hey Tariq,
> >>
> >> thanks for your quick answer. I'm not sure if I got the idea in the
> >> seond part of your answer. You mean if I use a timestamp as a rowkey I
> >> should append a hash like this:
> >>
> >> 1357279200000+MD5HASH
> >>
> >> and then the data would be distributed more equally?
> >>
> >>
> >> 2013/2/19 Mohammad Tariq <[EMAIL PROTECTED]>:
> >> > Hello Paul,
> >> >
> >> >     Try this and see if it works :
> >> >        scan.setStartRow(Bytes.toBytes(startDate.getTime() + ""));
> >> >        scan.setStopRow(Bytes.toBytes(endDate.getTime() + 1 + ""));
> >> >
> >> > Also try not to use TS as the rowkey, as it may lead to RS
> hotspotting.
> >> > Just add a hash to your rowkeys so that data is distributed evenly on
> all
> >> > the RSs.
> >> >
> >> > Warm Regards,
> >> > Tariq
> >> > https://mtariq.jux.com/
> >> > cloudfront.blogspot.com
> >> >
> >> >
> >> > On Tue, Feb 19, 2013 at 9:41 PM, Paul van Hoven <
> >> > [EMAIL PROTECTED]> wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> I'm currently playing with hbase. The design of the rowkey seems to
> be
> >> >> critical.
> >> >>
> >> >> The rowkey for a certain database table of mine is:
> >> >>
> >> >> timestamp+ipaddress
> >> >>
> >> >> It looks something like this when performing a scan on the table in
> the
> >> >> shell:
> >> >> hbase(main):012:0> scan 'ToyDataTable'
> >> >> ROW                                         COLUMN+CELL
> >> >>  1357020000000+192.168.178.9                column=CF:SampleCol,
> >> >> timestamp=1361288601717, value=Entry_1 = 2013-01-01 07:00:00
> >> >>
> >> >> Since I got several rows for different timestamps I'd like to tell a
> >> >> scan to just a region of the table for example from 2013-01-07 to
> >> >> 2013-01-09. Previously I only had a timestamp as the rowkey and I
> >> >> could restrict the rowkey like that:
> >> >>
> >> >> SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd
> >> HH:mm:ss");
> >> >>                         Date startDate = formatter.parse("2013-01-07
> >> >> 07:00:00");
> >> >>                         Date endDate = formatter.parse("2013-01-10
> >> >> 07:00:00");
> >> >>
> >> >>                         HTableInterface toyDataTable > >> >> pool.getTable("ToyDataTable");
> >> >>                         Scan scan = new Scan( Bytes.toBytes(
> >> >> startDate.getTime() ),
> >> >> Bytes.toBytes( endDate.getTime() ) );
> >> >>
> >> >
+
Mohammad Tariq 2013-02-21, 22:25