Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Rowkey design question


Copy link to this message
-
Re: Rowkey design question
You can use FuzzyRowFilter<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FuzzyRowFilter.html>to
do that.

Have a look at this
link<http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/>.
You might find it helpful.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Tue, Feb 19, 2013 at 11:20 PM, Paul van Hoven <
[EMAIL PROTECTED]> wrote:

> Yeah it worked fine.
>
> But as I understand: If I prefix my row key with something like
>
> md5-hash + timestamp
>
> then the rowkeys are probably evenly distributed but how would I
> perform then a scan restricted to a special time range?
>
>
> 2013/2/19 Mohammad Tariq <[EMAIL PROTECTED]>:
> > No. before the timestamp. All the row keys which are identical go to the
> > same region. This is the default Hbase behavior and is meant to make the
> > performance better. But sometimes the machine gets overloaded with reads
> > and writes because we get concentrated on that particular machine. For
> > example timeseries data. So it's better to hash the keys in order to make
> > them go to all the machines equally. HTH
> >
> > BTW, did that range query work??
> >
> > Warm Regards,
> > Tariq
> > https://mtariq.jux.com/
> > cloudfront.blogspot.com
> >
> >
> > On Tue, Feb 19, 2013 at 9:54 PM, Paul van Hoven <
> > [EMAIL PROTECTED]> wrote:
> >
> >> Hey Tariq,
> >>
> >> thanks for your quick answer. I'm not sure if I got the idea in the
> >> seond part of your answer. You mean if I use a timestamp as a rowkey I
> >> should append a hash like this:
> >>
> >> 1357279200000+MD5HASH
> >>
> >> and then the data would be distributed more equally?
> >>
> >>
> >> 2013/2/19 Mohammad Tariq <[EMAIL PROTECTED]>:
> >> > Hello Paul,
> >> >
> >> >     Try this and see if it works :
> >> >        scan.setStartRow(Bytes.toBytes(startDate.getTime() + ""));
> >> >        scan.setStopRow(Bytes.toBytes(endDate.getTime() + 1 + ""));
> >> >
> >> > Also try not to use TS as the rowkey, as it may lead to RS
> hotspotting.
> >> > Just add a hash to your rowkeys so that data is distributed evenly on
> all
> >> > the RSs.
> >> >
> >> > Warm Regards,
> >> > Tariq
> >> > https://mtariq.jux.com/
> >> > cloudfront.blogspot.com
> >> >
> >> >
> >> > On Tue, Feb 19, 2013 at 9:41 PM, Paul van Hoven <
> >> > [EMAIL PROTECTED]> wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> I'm currently playing with hbase. The design of the rowkey seems to
> be
> >> >> critical.
> >> >>
> >> >> The rowkey for a certain database table of mine is:
> >> >>
> >> >> timestamp+ipaddress
> >> >>
> >> >> It looks something like this when performing a scan on the table in
> the
> >> >> shell:
> >> >> hbase(main):012:0> scan 'ToyDataTable'
> >> >> ROW                                         COLUMN+CELL
> >> >>  1357020000000+192.168.178.9                column=CF:SampleCol,
> >> >> timestamp=1361288601717, value=Entry_1 = 2013-01-01 07:00:00
> >> >>
> >> >> Since I got several rows for different timestamps I'd like to tell a
> >> >> scan to just a region of the table for example from 2013-01-07 to
> >> >> 2013-01-09. Previously I only had a timestamp as the rowkey and I
> >> >> could restrict the rowkey like that:
> >> >>
> >> >> SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd
> >> HH:mm:ss");
> >> >>                         Date startDate = formatter.parse("2013-01-07
> >> >> 07:00:00");
> >> >>                         Date endDate = formatter.parse("2013-01-10
> >> >> 07:00:00");
> >> >>
> >> >>                         HTableInterface toyDataTable > >> >> pool.getTable("ToyDataTable");
> >> >>                         Scan scan = new Scan( Bytes.toBytes(
> >> >> startDate.getTime() ),
> >> >> Bytes.toBytes( endDate.getTime() ) );
> >> >>
> >> >> But this no longer works with my new design.
> >> >>
> >> >> Is there a way to tell the scan object to filter the rows with
> respect
> >> >> to the timestamp, or do I have to use a filter object?