Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> more regionservers does not improve performance


Copy link to this message
-
Re: more regionservers does not improve performance
Kevin,

Sorry, I am fairly new to HBase. Can you be specific about what settings I
can change, and also where they are specified?

Pretty sure I am not hotspotting, and increasing memstore does not seem to
have any effect.

I do not seen any messages in my regionserver logs concerning blocking.

I am suspecting that I am hitting some limit in our grid, but would like to
know where that limit is being imposed.

Jon

On Fri, Oct 12, 2012 at 6:44 AM, Kevin O'dell <[EMAIL PROTECTED]>wrote:

> Jonathan,
>
>   Lets take a deeper look here.
>
> What is your memstore set at for the table/CF in question?  Lets compare
> that value with the flush size you are seeing for your regions.  If they
> are really small flushes is it all to the same region?  If so that is going
> to be schema issues.  If they are full flushes you can up your memstore
> assuming you have the heap to cover it.  If they are smaller flushes but to
> different regions you most likely are suffering from global limit pressure
> and flushing too soon.
>
> Are you flushing prematurely due to HLogs rolling?  Take a look for too
> many hlogs and look at the flushes.  It may benefit you to raise that
> value.
>
> Are you blocking?  As Suraj was saying you may be blocking in 90second
> blocks.  Check the RS logs for those messages as well and then Suraj's
> advice.
>
> This is where I would start to optimize your write path.  I hope the above
> helps.
>
> On Fri, Oct 12, 2012 at 3:34 AM, Suraj Varma <[EMAIL PROTECTED]> wrote:
>
> > What have you configured your hbase.hstore.blockingStoreFiles and
> > hbase.hregion.memstore.block.multiplier? Both of these block updates
> > when the limit is hit. Try increasing these to say 20 and 4 from the
> > default 7 and 2 and see if it helps.
> >
> > If this still doesn't help, see if you can set up ganglia to get a
> > better insight into what is bottlenecking.
> > --Suraj
> >
> >
> >
> > On Thu, Oct 11, 2012 at 11:47 PM, Pankaj Misra
> > <[EMAIL PROTECTED]> wrote:
> > > OK, Looks like I missed out reading that part in your original mail.
> Did
> > you try some of the compaction tweaks and configurations as explained in
> > the following link for your data?
> > > http://hbase.apache.org/book/regions.arch.html#compaction
> > >
> > >
> > > Also, how much data are your putting into the regions, and how big is
> > one region at the end of data ingestion?
> > >
> > > Thanks and Regards
> > > Pankaj Misra
> > >
> > > -----Original Message-----
> > > From: Jonathan Bishop [mailto:[EMAIL PROTECTED]]
> > > Sent: Friday, October 12, 2012 12:04 PM
> > > To: [EMAIL PROTECTED]
> > > Subject: RE: more regionservers does not improve performance
> > >
> > > Pankaj,
> > >
> > > Thanks  for the reply.
> > >
> > > Actually, I am using MD5 hashing to evenly spread the keys among the
> > splits, so I don’t believe there is any hotspot. In fact, when I monitory
> > the web UI for HBase I see a very even load on all the regionservers.
> > >
> > > Jon
> > >
> > > Sent from my Windows 8 PC <
> http://windows.microsoft.com/consumer-preview
> > >
> > >
> > >  *From:* Pankaj Misra <[EMAIL PROTECTED]>
> > > *Sent:* Thursday, October 11, 2012 8:24:32 PM
> > > *To:* [EMAIL PROTECTED]
> > > *Subject:* RE: more regionservers does not improve performance
> > >
> > > Hi Jonathan,
> > >
> > > What seems to me is that, while doing the split across all 40 mappers,
> > the keys are not randomized enough to leverage multiple regions and the
> > pre-split strategy. This may be happening because all the 40 mappers may
> be
> > trying to write onto a single region for sometime, making it a HOT
> region,
> >  till the key falls into another region, and then the other region
> becomes
> > a HOT region hence you may seeing a high impact of compaction cycles
> > reducing your throughput.
> > >
> > > Are the keys incremental? Are the keys randomized enough across the
> > splits?
> > >
> > > Ideally when all 40 mappers are running you should see all the regions
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB