Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Read access pattern


Copy link to this message
-
Re: Read access pattern
HBASE-4811 is what you should be looking for, but it's not even close
to be implemented yet...

One option will be to have 2 tables, each in a reserved order. So
scanning forward in each will give you the key just after which at the
end will give you the key before the and the after...

2013/4/29  <[EMAIL PROTECTED]>:
>
> Thanx for the quick answer.
>
>> For the next key, I think you can simply use your current key as your
>> scanner first key. You will then find the one which is just after.
>> Then you will have to verify the MD5 hash to make sure it's still for
>> the same object.
> Right, this is basically easy.
>
>> First, if you know that you are storing data about every 10 seconds,
>> set the startRow with something like
>> getMD5AsHex(Bytes.toBytes(myObjectId)) + String.format("%19d\n",
>> (Long.MAX_VALUE - (changeDate.getTime() - 60000))) then ready the few
>> lines you will have until you find your current line, and keep the
>> last one.
>
> Actually it is impossible to know the timerange for which there will be a next entry
>
>>
>> Else, if you don't know, you will have to start with
>> scan.setStartRow(getMD5AsHex(Bytes.toBytes(myObjectId))); but you
>> might have to skip MANY lines before finding the right one. Do I don't
>> really recommend that.
>
> ouch, obviously not very efficient. I assume even with a filter ?
>> Message du 29/04/13 18:18
>> De : "Jean-Marc Spaggiari"
>> A : [EMAIL PROTECTED]
>> Copie à :
>> Objet : Re: Read access pattern
>>
>> Hum.
>>
>> For the next key, I think you can simply use your current key as your
>> scanner first key. You will then find the one which is just after.
>> Then you will have to verify the MD5 hash to make sure it's still for
>> the same object.
>>
>> scan.setStartRow(getMD5AsHex(Bytes.toBytes(myObjectId)) +
>> String.format("%19d\n", (Long.MAX_VALUE - changeDate.getTime())));
>>
>> If you want to find the one just before, quickly, I see 2 options.
>>
>> First, if you know that you are storing data about every 10 seconds,
>> set the startRow with something like
>> getMD5AsHex(Bytes.toBytes(myObjectId)) + String.format("%19d\n",
>> (Long.MAX_VALUE - (changeDate.getTime() - 60000))) then ready the few
>> lines you will have until you find your current line, and keep the
>> last one.
>>
>> Else, if you don't know, you will have to start with
>> scan.setStartRow(getMD5AsHex(Bytes.toBytes(myObjectId))); but you
>> might have to skip MANY lines before finding the right one. Do I don't
>> really recommend that.
>>
>> JM
>>
>> 2013/4/29 Shahab Yunus :
>> > I think you cannot use the scanner simply to to a range scan here as your
>> > keys are not monotonically increasing. You need to apply logic to
>> > decode/reverse your mechanism that you have used to hash your keys at the
>> > time of writing. You might want to check out the SemaText library which
>> > does distributed scans and seem to handle the scenarios that you want to
>> > implement.
>> > http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/
>> >
>> >
>> > On Mon, Apr 29, 2013 at 11:03 AM, wrote:
>> >
>> >> Hi,
>> >>
>> >> I have a rowkey defined by :
>> >> getMD5AsHex(Bytes.toBytes(myObjectId)) + String.format("%19d\n",
>> >> (Long.MAX_VALUE - changeDate.getTime()));
>> >>
>> >> How could I get the previous and next row for a given rowkey ?
>> >> For instance, I have the following ordered keys :
>> >>
>> >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370673172227807
>> >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674468022807
>> >> >00003db1b6c1e7e7d2ece41ff2184f76*9223370674468862807
>> >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674984237807
>> >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674987271807
>> >>
>> >> If I choose the rowkey :
>> >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674468862807, what would be the
>> >> correct scan to get the previous and next key ?
>> >> Result would be :
>> >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674468022807
>> >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674984237807
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB