Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase


Copy link to this message
-
Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase
Awesome, I'm going to check it out and use it today. Thank you :)

On Thu, May 19, 2011 at 8:14 AM, Alex Baranau <[EMAIL PROTECTED]>wrote:

> Implemented RowKeyDistributorByHashPrefix. From README:
>
> Another useful RowKeyDistributor is RowKeyDistributorByHashPrefix. Please
> see
> example below. It creates "distributed key" based on original key value
>
> so that later when you have original key and want to update the record you
> can
> calculate distributed key without roundtrip to HBase.
>
> AbstractRowKeyDistributor keyDistributor >         new RowKeyDistributorByHashPrefix(
>                   new RowKeyDistributorByHashPrefix.OneByteSimpleHash(15));
>
> You can use your own hashing logic here by implementing simple interface:
>
> public static interface Hasher extends Parametrizable {
>   byte[] getHashPrefix(byte[] originalKey);
>   byte[][] getAllPossiblePrefixes();
> }
>
>
> OneByteSimpleHash implements very simple hash algorythm: simple sum of all
> bytes in row key % maxBuckets. In example above 15 is maxBuckets count. You
> can use buckets count # up to 255. Please, use wisely, as (the same thing as
> with byOneByte prefix) Disctributed scanner will instantiate this number of
> scans under the hood.
>
> With this row key hash-based distributor, you can find out the distributed
> key (and use it to update the record) without roundtrip to HBase. From
> unit-test:
>
>     // Testing simple get
>     byte[] originalKey = new byte[] {123, 124, 122};
>
>     Put put = new Put(keyDistributor.getDistributedKey(originalKey));
>     put.add(CF, QUAL, Bytes.toBytes("some"));
>     hTable.put(put);
>
>     byte[] distributedKey = keyDistributor.getDistributedKey(originalKey);
>     Result result = hTable.get(new Get(distributedKey));
>     Assert.assertArrayEquals(originalKey,
> keyDistributor.getOriginalKey(result.getRow()));
>     Assert.assertArrayEquals(Bytes.toBytes("some"), result.getValue(CF,
> QUAL));
>
>
> NOTE: This feature is included in hbasewd-0.1.0-SNAPSHOT-2011.05.19.jar
> (downloadable from https://github.com/sematext/HBaseWD)
>
> Alex Baranau
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
>
> P.S.
> > Can you summarize HBaseWD in your blog
> That is on my todo list! You pushed it higher to the top priority items ;)
>
>
> On Thu, May 19, 2011 at 6:50 AM, Weishung Chung <[EMAIL PROTECTED]>wrote:
>
>> I have another question about option 2. It seems like I need to handle the
>> distributed scan differently to read from start row to end row, assuming 1
>> byte hash of the original key is used as prefix since the order of the
>> original key range is different from the resulting distributed key range.
>>
>> On Wed, May 18, 2011 at 6:18 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>>
>> > Alex:
>> > Can you summarize HBaseWD in your blog, including points 1 and 2 below ?
>> >
>> > Thanks
>> >
>> > On Wed, May 18, 2011 at 8:03 AM, Alex Baranau <[EMAIL PROTECTED]
>> > >wrote:
>> >
>> > > There are several options here. E.g.:
>> > >
>> > > 1) Given that you have "original key" of the record, you can fetch the
>> > > stored record key from HBase and use it to create Put with updated (or
>> > new)
>> > > cells.
>> > >
>> > > Currently you'll need to use distributes scan for that, there's not
>> > > analogue
>> > > for Get operation yet (see
>> https://github.com/sematext/HBaseWD/issues/1
>> > ).
>> > >
>> > > Note: you need to first find out the real key of stored record by
>> > fetching
>> > > data from HBase in case you use included in current lib
>> > > RowKeyDistributorByOneBytePrefix. Alternatively, see next option:
>> > >
>> > > 2) You can create your own RowKeyDistributor implementation which will
>> > > create "distributed key" based on original key value so that later
>> when
>> > you
>> > > have original key and want to update the record you can calculate
>> > > distributed key without roundtrip to HBase.
>> > >
>> > > E.g. your RowKeyDistributor implementation you can calculate 1-byte