Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Rowkey design question


Copy link to this message
-
Re: Rowkey design question
Paul van Hoven 2013-02-19, 17:50
Yeah it worked fine.

But as I understand: If I prefix my row key with something like

md5-hash + timestamp

then the rowkeys are probably evenly distributed but how would I
perform then a scan restricted to a special time range?
2013/2/19 Mohammad Tariq <[EMAIL PROTECTED]>:
> No. before the timestamp. All the row keys which are identical go to the
> same region. This is the default Hbase behavior and is meant to make the
> performance better. But sometimes the machine gets overloaded with reads
> and writes because we get concentrated on that particular machine. For
> example timeseries data. So it's better to hash the keys in order to make
> them go to all the machines equally. HTH
>
> BTW, did that range query work??
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Tue, Feb 19, 2013 at 9:54 PM, Paul van Hoven <
> [EMAIL PROTECTED]> wrote:
>
>> Hey Tariq,
>>
>> thanks for your quick answer. I'm not sure if I got the idea in the
>> seond part of your answer. You mean if I use a timestamp as a rowkey I
>> should append a hash like this:
>>
>> 1357279200000+MD5HASH
>>
>> and then the data would be distributed more equally?
>>
>>
>> 2013/2/19 Mohammad Tariq <[EMAIL PROTECTED]>:
>> > Hello Paul,
>> >
>> >     Try this and see if it works :
>> >        scan.setStartRow(Bytes.toBytes(startDate.getTime() + ""));
>> >        scan.setStopRow(Bytes.toBytes(endDate.getTime() + 1 + ""));
>> >
>> > Also try not to use TS as the rowkey, as it may lead to RS hotspotting.
>> > Just add a hash to your rowkeys so that data is distributed evenly on all
>> > the RSs.
>> >
>> > Warm Regards,
>> > Tariq
>> > https://mtariq.jux.com/
>> > cloudfront.blogspot.com
>> >
>> >
>> > On Tue, Feb 19, 2013 at 9:41 PM, Paul van Hoven <
>> > [EMAIL PROTECTED]> wrote:
>> >
>> >> Hi,
>> >>
>> >> I'm currently playing with hbase. The design of the rowkey seems to be
>> >> critical.
>> >>
>> >> The rowkey for a certain database table of mine is:
>> >>
>> >> timestamp+ipaddress
>> >>
>> >> It looks something like this when performing a scan on the table in the
>> >> shell:
>> >> hbase(main):012:0> scan 'ToyDataTable'
>> >> ROW                                         COLUMN+CELL
>> >>  1357020000000+192.168.178.9                column=CF:SampleCol,
>> >> timestamp=1361288601717, value=Entry_1 = 2013-01-01 07:00:00
>> >>
>> >> Since I got several rows for different timestamps I'd like to tell a
>> >> scan to just a region of the table for example from 2013-01-07 to
>> >> 2013-01-09. Previously I only had a timestamp as the rowkey and I
>> >> could restrict the rowkey like that:
>> >>
>> >> SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd
>> HH:mm:ss");
>> >>                         Date startDate = formatter.parse("2013-01-07
>> >> 07:00:00");
>> >>                         Date endDate = formatter.parse("2013-01-10
>> >> 07:00:00");
>> >>
>> >>                         HTableInterface toyDataTable >> >> pool.getTable("ToyDataTable");
>> >>                         Scan scan = new Scan( Bytes.toBytes(
>> >> startDate.getTime() ),
>> >> Bytes.toBytes( endDate.getTime() ) );
>> >>
>> >> But this no longer works with my new design.
>> >>
>> >> Is there a way to tell the scan object to filter the rows with respect
>> >> to the timestamp, or do I have to use a filter object?
>> >>
>>