Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase


Copy link to this message
-
Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase
Awesome, I'm going to check it out and use it today. Thank you :)

On Thu, May 19, 2011 at 8:14 AM, Alex Baranau <[EMAIL PROTECTED]>wrote:

> Implemented RowKeyDistributorByHashPrefix. From README:
>
> Another useful RowKeyDistributor is RowKeyDistributorByHashPrefix. Please
> see
> example below. It creates "distributed key" based on original key value
>
> so that later when you have original key and want to update the record you
> can
> calculate distributed key without roundtrip to HBase.
>
> AbstractRowKeyDistributor keyDistributor >         new RowKeyDistributorByHashPrefix(
>                   new RowKeyDistributorByHashPrefix.OneByteSimpleHash(15));
>
> You can use your own hashing logic here by implementing simple interface:
>
> public static interface Hasher extends Parametrizable {
>   byte[] getHashPrefix(byte[] originalKey);
>   byte[][] getAllPossiblePrefixes();
> }
>
>
> OneByteSimpleHash implements very simple hash algorythm: simple sum of all
> bytes in row key % maxBuckets. In example above 15 is maxBuckets count. You
> can use buckets count # up to 255. Please, use wisely, as (the same thing as
> with byOneByte prefix) Disctributed scanner will instantiate this number of
> scans under the hood.
>
> With this row key hash-based distributor, you can find out the distributed
> key (and use it to update the record) without roundtrip to HBase. From
> unit-test:
>
>     // Testing simple get
>     byte[] originalKey = new byte[] {123, 124, 122};
>
>     Put put = new Put(keyDistributor.getDistributedKey(originalKey));
>     put.add(CF, QUAL, Bytes.toBytes("some"));
>     hTable.put(put);
>
>     byte[] distributedKey = keyDistributor.getDistributedKey(originalKey);
>     Result result = hTable.get(new Get(distributedKey));
>     Assert.assertArrayEquals(originalKey,
> keyDistributor.getOriginalKey(result.getRow()));
>     Assert.assertArrayEquals(Bytes.toBytes("some"), result.getValue(CF,
> QUAL));
>
>
> NOTE: This feature is included in hbasewd-0.1.0-SNAPSHOT-2011.05.19.jar
> (downloadable from https://github.com/sematext/HBaseWD)
>
> Alex Baranau
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
>
> P.S.
> > Can you summarize HBaseWD in your blog
> That is on my todo list! You pushed it higher to the top priority items ;)
>
>
> On Thu, May 19, 2011 at 6:50 AM, Weishung Chung <[EMAIL PROTECTED]>wrote:
>
>> I have another question about option 2. It seems like I need to handle the
>> distributed scan differently to read from start row to end row, assuming 1
>> byte hash of the original key is used as prefix since the order of the
>> original key range is different from the resulting distributed key range.
>>
>> On Wed, May 18, 2011 at 6:18 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>>
>> > Alex:
>> > Can you summarize HBaseWD in your blog, including points 1 and 2 below ?
>> >
>> > Thanks
>> >
>> > On Wed, May 18, 2011 at 8:03 AM, Alex Baranau <[EMAIL PROTECTED]
>> > >wrote:
>> >
>> > > There are several options here. E.g.:
>> > >
>> > > 1) Given that you have "original key" of the record, you can fetch the
>> > > stored record key from HBase and use it to create Put with updated (or
>> > new)
>> > > cells.
>> > >
>> > > Currently you'll need to use distributes scan for that, there's not
>> > > analogue
>> > > for Get operation yet (see
>> https://github.com/sematext/HBaseWD/issues/1
>> > ).
>> > >
>> > > Note: you need to first find out the real key of stored record by
>> > fetching
>> > > data from HBase in case you use included in current lib
>> > > RowKeyDistributorByOneBytePrefix. Alternatively, see next option:
>> > >
>> > > 2) You can create your own RowKeyDistributor implementation which will
>> > > create "distributed key" based on original key value so that later
>> when
>> > you
>> > > have original key and want to update the record you can calculate
>> > > distributed key without roundtrip to HBase.
>> > >
>> > > E.g. your RowKeyDistributor implementation you can calculate 1-byte
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB