Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> more regionservers does not improve performance


Copy link to this message
-
Re: more regionservers does not improve performance
Suraj,

Thanks for the quick reply.

I tried various values of hbase.hstore.blockingStoreFiles and
hbase.hregion.memstore.block.multiplier, but this did not have any effect I
could discern. I am still getting about 5K rows per second regardless of
the number of regionservers I am running.

After asking around I discovered we do have ganglia installed on our grid.
Taking a look at some of the machines running my regionservers I do see a
spike in I/O when I run my MR job. Not sure if this is the bottleneck for
me though, as the spikes are at about 20-30 MB/sec.

Jon

On Fri, Oct 12, 2012 at 12:34 AM, Suraj Varma <[EMAIL PROTECTED]> wrote:

> What have you configured your hbase.hstore.blockingStoreFiles and
> hbase.hregion.memstore.block.multiplier? Both of these block updates
> when the limit is hit. Try increasing these to say 20 and 4 from the
> default 7 and 2 and see if it helps.
>
> If this still doesn't help, see if you can set up ganglia to get a
> better insight into what is bottlenecking.
> --Suraj
>
>
>
> On Thu, Oct 11, 2012 at 11:47 PM, Pankaj Misra
> <[EMAIL PROTECTED]> wrote:
> > OK, Looks like I missed out reading that part in your original mail. Did
> you try some of the compaction tweaks and configurations as explained in
> the following link for your data?
> > http://hbase.apache.org/book/regions.arch.html#compaction
> >
> >
> > Also, how much data are your putting into the regions, and how big is
> one region at the end of data ingestion?
> >
> > Thanks and Regards
> > Pankaj Misra
> >
> > -----Original Message-----
> > From: Jonathan Bishop [mailto:[EMAIL PROTECTED]]
> > Sent: Friday, October 12, 2012 12:04 PM
> > To: [EMAIL PROTECTED]
> > Subject: RE: more regionservers does not improve performance
> >
> > Pankaj,
> >
> > Thanks  for the reply.
> >
> > Actually, I am using MD5 hashing to evenly spread the keys among the
> splits, so I don’t believe there is any hotspot. In fact, when I monitory
> the web UI for HBase I see a very even load on all the regionservers.
> >
> > Jon
> >
> > Sent from my Windows 8 PC <http://windows.microsoft.com/consumer-preview
> >
> >
> >  *From:* Pankaj Misra <[EMAIL PROTECTED]>
> > *Sent:* Thursday, October 11, 2012 8:24:32 PM
> > *To:* [EMAIL PROTECTED]
> > *Subject:* RE: more regionservers does not improve performance
> >
> > Hi Jonathan,
> >
> > What seems to me is that, while doing the split across all 40 mappers,
> the keys are not randomized enough to leverage multiple regions and the
> pre-split strategy. This may be happening because all the 40 mappers may be
> trying to write onto a single region for sometime, making it a HOT region,
>  till the key falls into another region, and then the other region becomes
> a HOT region hence you may seeing a high impact of compaction cycles
> reducing your throughput.
> >
> > Are the keys incremental? Are the keys randomized enough across the
> splits?
> >
> > Ideally when all 40 mappers are running you should see all the regions
> being filled up in parallel for maximum throughput. Hope it helps.
> >
> > Thanks and Regards
> > Pankaj Misra
> >
> >
> > ________________________________________
> > From: Jonathan Bishop [[EMAIL PROTECTED]]
> > Sent: Friday, October 12, 2012 5:38 AM
> > To: [EMAIL PROTECTED]
> > Subject: more regionservers does not improve performance
> >
> > Hi,
> >
> > I am running a MR job with 40 simultaneous mappers, each of which does
> puts to HBase. I have ganged up the puts into groups of 1000 (this seems to
> help quite a bit) and also made sure that the table is pre-split into 100
> regions, and that the row keys are randomized using MD5 hashing.
> >
> > My cluster size is 10, and I am allowing 4 mappers per tasktracker.
> >
> > In my MR job I know that the mappers are able to generate puts much
> faster than the puts can be handled in hbase. In other words if I let the
> mappers run without doing hbase puts then everything scales as you would
> expect with the number of mappers created. It is the hbase puts which seem
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB