Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Rowkey design and presplit table


Copy link to this message
-
Re: Rowkey design and presplit table
What HBase version are you planning to use ?

In 0.94, you can refer to:
src/main/java/org/apache/hadoop/hbase/regionserver/KeyPrefixRegionSplitPolicy.java

You can write a policy which splits along category boundaries.

There're other split policies in case you're interested:

./src/main/java/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.java
./src/main/java/org/apache/hadoop/hbase/regionserver/DelimitedKeyPrefixRegionSplitPolicy.java
./src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java

Cheers

On Mon, Mar 4, 2013 at 12:55 PM, Lukáš Drbal <[EMAIL PROTECTED]> wrote:

> Hi Jilal,
> thanks for response, but can you give me please any link or explain it
> more?
> I don't know what you mean with regular expression spliting. My data are
> not fixed and will grow in time.
>
> Thanks.
>
> Regards
>
> Lukas Drbal
>
>
> 2013/3/4 Jilal Oussama <[EMAIL PROTECTED]>
>
> > You can split in your application using a regular expression on the
> > underscore char if the langage supports them (like spliting data of a csv
> > file)
> >
> >
> > 2013/3/4 Lukáš Drbal <[EMAIL PROTECTED]>
> >
> > > Hi,
> > >
> > > i have one question about rowkey design and presplit table.
> > >
> > > My usecase:
> > > I need store a lot of comments where each comment are for one article
> and
> > > this article has one category.
> > >
> > > What i need:
> > > 1) read one comment by id (where i know commentId, articleId and
> > > categoryId)
> > > 2) read all coments for article (i know categoryId and articleId)
> > > 3) read all comments for category (i know categoryId)
> > >
> > > From this read pattern i see one good rowkey:
> > > <categoryId>_<articleId>_<commentId>
> > >
> > > But here i don't have fixed size of rowkey, so i don't know how to
> define
> > > split pattern. How can be this solved?
> > > This id's come from external system and grow very fast, so add some
> like
> > > "padding" for each part are hard.
> > >
> > > Maybe i can use hash function for each part
> > > md5(<categoryId>_md5(<articleId>)_md5(<commentId>), but this rowkey is
> > very
> > > long (3*32+2 bytes), i don't have experience with this long rowkeys.
> > >
> > > Can someone give me a suggestions please?
> > >
> > > Regards
> > >
> > > Lukas Drbal
> > >
> >
>
>
>
> --
> Save The World - http://www.worldcommunitygrid.org/
> http://www.worldcommunitygrid.org/stat/viewMemberInfo.do?userName=LesTR
>
> LesTR
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB