Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Rowkey design and presplit table


+
Lukáš Drbal 2013-03-04, 10:48
+
Jilal Oussama 2013-03-04, 11:01
+
Lukáš Drbal 2013-03-04, 20:55
Copy link to this message
-
Re: Rowkey design and presplit table
Ted Yu 2013-03-04, 21:06
What HBase version are you planning to use ?

In 0.94, you can refer to:
src/main/java/org/apache/hadoop/hbase/regionserver/KeyPrefixRegionSplitPolicy.java

You can write a policy which splits along category boundaries.

There're other split policies in case you're interested:

./src/main/java/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.java
./src/main/java/org/apache/hadoop/hbase/regionserver/DelimitedKeyPrefixRegionSplitPolicy.java
./src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java

Cheers

On Mon, Mar 4, 2013 at 12:55 PM, Lukáš Drbal <[EMAIL PROTECTED]> wrote:

> Hi Jilal,
> thanks for response, but can you give me please any link or explain it
> more?
> I don't know what you mean with regular expression spliting. My data are
> not fixed and will grow in time.
>
> Thanks.
>
> Regards
>
> Lukas Drbal
>
>
> 2013/3/4 Jilal Oussama <[EMAIL PROTECTED]>
>
> > You can split in your application using a regular expression on the
> > underscore char if the langage supports them (like spliting data of a csv
> > file)
> >
> >
> > 2013/3/4 Lukáš Drbal <[EMAIL PROTECTED]>
> >
> > > Hi,
> > >
> > > i have one question about rowkey design and presplit table.
> > >
> > > My usecase:
> > > I need store a lot of comments where each comment are for one article
> and
> > > this article has one category.
> > >
> > > What i need:
> > > 1) read one comment by id (where i know commentId, articleId and
> > > categoryId)
> > > 2) read all coments for article (i know categoryId and articleId)
> > > 3) read all comments for category (i know categoryId)
> > >
> > > From this read pattern i see one good rowkey:
> > > <categoryId>_<articleId>_<commentId>
> > >
> > > But here i don't have fixed size of rowkey, so i don't know how to
> define
> > > split pattern. How can be this solved?
> > > This id's come from external system and grow very fast, so add some
> like
> > > "padding" for each part are hard.
> > >
> > > Maybe i can use hash function for each part
> > > md5(<categoryId>_md5(<articleId>)_md5(<commentId>), but this rowkey is
> > very
> > > long (3*32+2 bytes), i don't have experience with this long rowkeys.
> > >
> > > Can someone give me a suggestions please?
> > >
> > > Regards
> > >
> > > Lukas Drbal
> > >
> >
>
>
>
> --
> Save The World - http://www.worldcommunitygrid.org/
> http://www.worldcommunitygrid.org/stat/viewMemberInfo.do?userName=LesTR
>
> LesTR
>
+
Lukáš Drbal 2013-03-04, 21:27
+
Ted Yu 2013-03-04, 21:32
+
Asaf Mesika 2013-03-07, 07:42
+
James Taylor 2013-03-07, 08:42
+
Lukáš Drbal 2013-03-07, 22:32