Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - more regionservers does not improve performance


+
Jonathan Bishop 2012-10-12, 00:08
+
Pankaj Misra 2012-10-12, 03:23
+
Jonathan Bishop 2012-10-12, 06:34
+
Pankaj Misra 2012-10-12, 06:47
+
Suraj Varma 2012-10-12, 07:34
Copy link to this message
-
Re: more regionservers does not improve performance
Jonathan Bishop 2012-10-12, 19:07
Suraj,

Thanks for the quick reply.

I tried various values of hbase.hstore.blockingStoreFiles and
hbase.hregion.memstore.block.multiplier, but this did not have any effect I
could discern. I am still getting about 5K rows per second regardless of
the number of regionservers I am running.

After asking around I discovered we do have ganglia installed on our grid.
Taking a look at some of the machines running my regionservers I do see a
spike in I/O when I run my MR job. Not sure if this is the bottleneck for
me though, as the spikes are at about 20-30 MB/sec.

Jon

On Fri, Oct 12, 2012 at 12:34 AM, Suraj Varma <[EMAIL PROTECTED]> wrote:

> What have you configured your hbase.hstore.blockingStoreFiles and
> hbase.hregion.memstore.block.multiplier? Both of these block updates
> when the limit is hit. Try increasing these to say 20 and 4 from the
> default 7 and 2 and see if it helps.
>
> If this still doesn't help, see if you can set up ganglia to get a
> better insight into what is bottlenecking.
> --Suraj
>
>
>
> On Thu, Oct 11, 2012 at 11:47 PM, Pankaj Misra
> <[EMAIL PROTECTED]> wrote:
> > OK, Looks like I missed out reading that part in your original mail. Did
> you try some of the compaction tweaks and configurations as explained in
> the following link for your data?
> > http://hbase.apache.org/book/regions.arch.html#compaction
> >
> >
> > Also, how much data are your putting into the regions, and how big is
> one region at the end of data ingestion?
> >
> > Thanks and Regards
> > Pankaj Misra
> >
> > -----Original Message-----
> > From: Jonathan Bishop [mailto:[EMAIL PROTECTED]]
> > Sent: Friday, October 12, 2012 12:04 PM
> > To: [EMAIL PROTECTED]
> > Subject: RE: more regionservers does not improve performance
> >
> > Pankaj,
> >
> > Thanks  for the reply.
> >
> > Actually, I am using MD5 hashing to evenly spread the keys among the
> splits, so I don’t believe there is any hotspot. In fact, when I monitory
> the web UI for HBase I see a very even load on all the regionservers.
> >
> > Jon
> >
> > Sent from my Windows 8 PC <http://windows.microsoft.com/consumer-preview
> >
> >
> >  *From:* Pankaj Misra <[EMAIL PROTECTED]>
> > *Sent:* Thursday, October 11, 2012 8:24:32 PM
> > *To:* [EMAIL PROTECTED]
> > *Subject:* RE: more regionservers does not improve performance
> >
> > Hi Jonathan,
> >
> > What seems to me is that, while doing the split across all 40 mappers,
> the keys are not randomized enough to leverage multiple regions and the
> pre-split strategy. This may be happening because all the 40 mappers may be
> trying to write onto a single region for sometime, making it a HOT region,
>  till the key falls into another region, and then the other region becomes
> a HOT region hence you may seeing a high impact of compaction cycles
> reducing your throughput.
> >
> > Are the keys incremental? Are the keys randomized enough across the
> splits?
> >
> > Ideally when all 40 mappers are running you should see all the regions
> being filled up in parallel for maximum throughput. Hope it helps.
> >
> > Thanks and Regards
> > Pankaj Misra
> >
> >
> > ________________________________________
> > From: Jonathan Bishop [[EMAIL PROTECTED]]
> > Sent: Friday, October 12, 2012 5:38 AM
> > To: [EMAIL PROTECTED]
> > Subject: more regionservers does not improve performance
> >
> > Hi,
> >
> > I am running a MR job with 40 simultaneous mappers, each of which does
> puts to HBase. I have ganged up the puts into groups of 1000 (this seems to
> help quite a bit) and also made sure that the table is pre-split into 100
> regions, and that the row keys are randomized using MD5 hashing.
> >
> > My cluster size is 10, and I am allowing 4 mappers per tasktracker.
> >
> > In my MR job I know that the mappers are able to generate puts much
> faster than the puts can be handled in hbase. In other words if I let the
> mappers run without doing hbase puts then everything scales as you would
> expect with the number of mappers created. It is the hbase puts which seem
+
Kevin Odell 2012-10-12, 13:44
+
Jonathan Bishop 2012-10-12, 19:15
+
Bryan Beaudreault 2012-10-12, 19:46
+
Suraj Varma 2012-10-13, 02:30
+
Suraj Varma 2012-10-13, 02:49
+
Jonathan Bishop 2012-10-13, 15:55
+
Jonathan Bishop 2012-10-13, 15:58
+
Matt Corgan 2012-10-14, 05:37
+
Jonathan Bishop 2012-10-14, 15:48
+
lars hofhansl 2012-10-15, 01:03
+
Michel Segel 2012-10-15, 11:41
+
Matt Corgan 2012-10-15, 17:23
+
Matt Corgan 2012-10-15, 00:48
+
Jonathan Bishop 2012-10-15, 02:42