Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Read access pattern


+
ricla@... 2013-04-29, 15:03
+
Shahab Yunus 2013-04-29, 15:17
+
Jean-Marc Spaggiari 2013-04-29, 16:17
+
ricla@... 2013-04-29, 17:05
+
Jean-Marc Spaggiari 2013-04-29, 18:04
+
ricla@... 2013-04-30, 13:17
+
Asaf Mesika 2013-04-30, 05:49
+
ricla@... 2013-04-30, 14:58
+
Michael Segel 2013-04-30, 15:57
+
Shahab Yunus 2013-04-30, 16:17
Copy link to this message
-
Re: Read access pattern
bq. The downside that I see, is the bucket_number that we have to
maintain both at time or reading/writing and update it in case of
cluster restructuring.

I agree that this maintenance can be painful. However, Phoenix
(https://github.com/forcedotcom/phoenix) now supports salting,
automating this maintenance.  If you want to salt your table, just add a
SALT_BUCKETS = <n> property at the end of your DDL statement, where <n>
is the total number of buckets (up to a max of 256).  For example:

CREATE TABLE t (date_time DATE NOT NULL, event_id CHAR(15) NOT NULL
     CONSTRAINT pk PRIMARY KEY (date_time, event_id))
     SALT_BUCKETS=10;

This will add one byte at the beginning of your row key whose value is
formed by hashing the row key and mod-ing with 10. This will
automatically be done for any upsert and queries will automatically be
distributed and the results combined as expected.

Thanks,

James
@JamesPlusPlus
http://phoenix-hbase.blogspot.com/

On 04/30/2013 09:17 AM, Shahab Yunus wrote:
> Well those are *some* words :) Anyway, can you explain a bit in detail that
> why you feel so strongly about this design/approach? The salting here is
> not the only option mentioned and static hashing can be used as well. Plus
> even in case of salting, wouldn't the distributed scan take care of it? The
> downside that I see, is the bucket_number that we have to maintain both at
> time or reading/writing and update it in case of cluster restructuring.
>
> Thanks,
> Shahab
>
>
> On Tue, Apr 30, 2013 at 11:57 AM, Michael Segel
> <[EMAIL PROTECTED]>wrote:
>
>> Geez that's a bad article.
>> Never salt.
>>
>> And yes there's a difference between using a salt and using the first 2-4
>> bytes from your MD5 hash.
>>
>> (Hint: Salts are random. Your hash isn't. )
>>
>> Sorry to be-itch but its a bad idea and it shouldn't be propagated.
>>
>> On Apr 29, 2013, at 10:17 AM, Shahab Yunus <[EMAIL PROTECTED]> wrote:
>>
>>> I think you cannot use the scanner simply to to a range scan here as your
>>> keys are not monotonically increasing. You need to apply logic to
>>> decode/reverse your mechanism that you have used to hash your keys at the
>>> time of writing. You might want to check out the SemaText library which
>>> does distributed scans and seem to handle the scenarios that you want to
>>> implement.
>>>
>> http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/
>>>
>>> On Mon, Apr 29, 2013 at 11:03 AM, <[EMAIL PROTECTED]> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a rowkey defined by :
>>>>         getMD5AsHex(Bytes.toBytes(myObjectId)) + String.format("%19d\n",
>>>> (Long.MAX_VALUE - changeDate.getTime()));
>>>>
>>>> How could I get the previous and next row for a given rowkey ?
>>>> For instance, I have the following ordered keys :
>>>>
>>>> 00003db1b6c1e7e7d2ece41ff2184f76*9223370673172227807
>>>> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674468022807
>>>>> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674468862807
>>>> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674984237807
>>>> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674987271807
>>>>
>>>> If I choose the rowkey :
>>>> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674468862807, what would be the
>>>> correct scan to get the previous and next key ?
>>>> Result would be :
>>>> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674468022807
>>>> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674984237807
>>>>
>>>> Thank you !
>>>> R.
>>>>
>>>> Une messagerie gratuite, garantie � vie et des services en plus, �a vous
>>>> tente ?
>>>> Je cr�e ma bo�te mail www.laposte.net
>>>>
>>
+
Michael Segel 2013-04-30, 17:06
+
lars hofhansl 2013-05-01, 05:12
+
Michael Segel 2013-05-01, 14:14
+
Shahab Yunus 2013-05-01, 14:21
+
Naidu MS 2013-05-01, 07:25
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB