Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Rowkey design question


+
Paul van Hoven 2013-02-19, 16:11
+
Mohammad Tariq 2013-02-19, 16:16
Copy link to this message
-
Re: Rowkey design question
Hey Tariq,

thanks for your quick answer. I'm not sure if I got the idea in the
seond part of your answer. You mean if I use a timestamp as a rowkey I
should append a hash like this:

1357279200000+MD5HASH

and then the data would be distributed more equally?
2013/2/19 Mohammad Tariq <[EMAIL PROTECTED]>:
> Hello Paul,
>
>     Try this and see if it works :
>        scan.setStartRow(Bytes.toBytes(startDate.getTime() + ""));
>        scan.setStopRow(Bytes.toBytes(endDate.getTime() + 1 + ""));
>
> Also try not to use TS as the rowkey, as it may lead to RS hotspotting.
> Just add a hash to your rowkeys so that data is distributed evenly on all
> the RSs.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Tue, Feb 19, 2013 at 9:41 PM, Paul van Hoven <
> [EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> I'm currently playing with hbase. The design of the rowkey seems to be
>> critical.
>>
>> The rowkey for a certain database table of mine is:
>>
>> timestamp+ipaddress
>>
>> It looks something like this when performing a scan on the table in the
>> shell:
>> hbase(main):012:0> scan 'ToyDataTable'
>> ROW                                         COLUMN+CELL
>>  1357020000000+192.168.178.9                column=CF:SampleCol,
>> timestamp=1361288601717, value=Entry_1 = 2013-01-01 07:00:00
>>
>> Since I got several rows for different timestamps I'd like to tell a
>> scan to just a region of the table for example from 2013-01-07 to
>> 2013-01-09. Previously I only had a timestamp as the rowkey and I
>> could restrict the rowkey like that:
>>
>> SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
>>                         Date startDate = formatter.parse("2013-01-07
>> 07:00:00");
>>                         Date endDate = formatter.parse("2013-01-10
>> 07:00:00");
>>
>>                         HTableInterface toyDataTable >> pool.getTable("ToyDataTable");
>>                         Scan scan = new Scan( Bytes.toBytes(
>> startDate.getTime() ),
>> Bytes.toBytes( endDate.getTime() ) );
>>
>> But this no longer works with my new design.
>>
>> Is there a way to tell the scan object to filter the rows with respect
>> to the timestamp, or do I have to use a filter object?
>>
+
Mohammad Tariq 2013-02-19, 17:34
+
Paul van Hoven 2013-02-19, 17:50
+
Mohammad Tariq 2013-02-19, 17:54
+
Asaf Mesika 2013-02-21, 22:15
+
Mohammad Tariq 2013-02-21, 22:25
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB