Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Rowkey design and presplit table


Copy link to this message
-
Re: Rowkey design and presplit table
Asaf Mesika 2013-03-07, 07:42
I would convert each id to long and then use Bytes.toBytes to convert this
long to a byte array. If it is an int then even better.
Now, write all 3 longs one after another to one array which will be your
rowkey.
This gives you:
* fixed size
* small row key - 3*8 bytes if you use long and 3*4 for int.

Why do you need to use prefix split policy?

On Monday, March 4, 2013, Lukáš Drbal wrote:

> Hi,
>
> i have one question about rowkey design and presplit table.
>
> My usecase:
> I need store a lot of comments where each comment are for one article and
> this article has one category.
>
> What i need:
> 1) read one comment by id (where i know commentId, articleId and
> categoryId)
> 2) read all coments for article (i know categoryId and articleId)
> 3) read all comments for category (i know categoryId)
>
> From this read pattern i see one good rowkey:
> <categoryId>_<articleId>_<commentId>
>
> But here i don't have fixed size of rowkey, so i don't know how to define
> split pattern. How can be this solved?
> This id's come from external system and grow very fast, so add some like
> "padding" for each part are hard.
>
> Maybe i can use hash function for each part
> md5(<categoryId>_md5(<articleId>)_md5(<commentId>), but this rowkey is very
> long (3*32+2 bytes), i don't have experience with this long rowkeys.
>
> Can someone give me a suggestions please?
>
> Regards
>
> Lukas Drbal
>