Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Is there a problem with having 4000 tables in a cluster?


Copy link to this message
-
Re: Is there a problem with having 4000 tables in a cluster?
Its better to do some "salting" in your keys for the reduce phase.
Basically, make ur key be something like "KeyHash + Key" and then decode it
in your reducer and write to HBase. This way you avoid the hotspotting
problem on HBase due to MapReduce sorting.
On Tue, Sep 24, 2013 at 2:50 PM, Jean-Marc Spaggiari <
[EMAIL PROTECTED]> wrote:

> Hi Jeremy,
>
> I don't see any issue for HBase to handle 4000 tables. However, I don't
> think it's the best solution for your use case.
>
> JM
>
>
> 2013/9/24 jeremy p <[EMAIL PROTECTED]>
>
> > Short description : I'd like to have 4000 tables in my HBase cluster.
>  Will
> > this be a problem?  In general, what problems do you run into when you
> try
> > to host thousands of tables in a cluster?
> >
> > Long description : I'd like the performance advantage of pre-split
> tables,
> > and I'd also like to do filtered range scans.  Imagine a keyspace where
> the
> > key consists of : [POSITION]_[WORD] , where POSITION is a number from 1
> to
> > 4000, and WORD is a string consisting of 96 characters.  The value in the
> > cell would be a single integer.  My app will examine a 'document', where
> > each 'line' consists of 4000 WORDs.  For each WORD, it'll do a filtered
> > regex lookup.  Only problem?  Say I have 200 mappers and they all start
> at
> > POSITION 1, my region servers would get hotspotted like crazy. So my idea
> > is to break it into 4000 tables (one for each POSITION), and then
> pre-split
> > the tables such that each region gets an equal amount of the traffic.  In
> > this scenario, the key would just be WORD.  Dunno if this a bad idea,
> would
> > be open to suggestions
> >
> > Thanks!
> >
> > --J
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB