Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> more regionservers does not improve performance


Copy link to this message
-
Re: more regionservers does not improve performance
Matt,

Yes, I did. What I observed is that the map job proceeds about 3-4x faster
for  a while. But then I observed long pauses partway through the job, and
overall run time was only reduced only modestly, way from 50 minutes to 40
minutes.

Just to summarize the issue, my mapper jobs seem to scale nicely. This is
expected as my dfs block size is small enough to create over 500 tasks, and
I have a max of 40 mappers running.

But when I include puts to hbase in my job, then I see a 4-6x slowdown
which does not respond to an increasing number of regionservers.

My current best guess is that there is a network bottleneck in getting the
puts produced by the mappers to the appropriate regionservers, as I assume
that once the puts are received by the regionservers that they can all
operate in parallel without slowing each other down.

Again, I am on grid which is used by many others, and the machines in my
cluster are not dedicated to my job. I am mainly looking at scalability
trends when running with various numbers of regionservers.

Jon

On Sat, Oct 13, 2012 at 10:37 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:

> Did you try setting put.setWriteToWAL(false) as Bryan suggested?  This may
> not be what you want in the end, but seeing what happens may help debug.
>
> Matt
>
> On Sat, Oct 13, 2012 at 8:58 AM, Jonathan Bishop <[EMAIL PROTECTED]
> >wrote:
>
> > Suraj,
> >
> > I bumped my regionservers all the way up to 32g from 8g. They are running
> > on 64g and 128g machines on our cluster. Unfortunately, the machines all
> > have various states of loading (usually high) from other users.
> >
> > In ganglia I do not see any swapping, but that has been known to happen
> > from time to time.
> >
> > Thanks for your help - I'll take a look at your links.
> >
> > Jon
> >
> > On Fri, Oct 12, 2012 at 7:30 PM, Suraj Varma <[EMAIL PROTECTED]>
> wrote:
> >
> > > Hi Jonathan:
> > > What specific metric on ganglia did you notice for "IO is spiking"? Is
> > > it your disk IO? Is your disk swapping? Do you see cpu iowait spikes?
> > >
> > > I see you have given 8g to the RegionServer ... how much RAM is
> > > available total on that node? What heap are the individual mappers &
> > > DN set to run on (i.e. check whether you are overallocated on heap
> > > when the _mappers_ run ... causing disk swapping ... leading to IO?).
> > >
> > > There can be multiple causes ... so, you may need to look at ganglia
> > > stats and narrow the bottleneck down as described in
> > > http://hbase.apache.org/book/casestudies.perftroub.html
> > >
> > > Here's a good reference for all the memstore related tweaks you can
> > > try (and also to understand what each configuration means):
> > >
> http://blog.sematext.com/2012/07/16/hbase-memstore-what-you-should-know/
> > >
> > > Also, provide more details on your schema (CFs, row size), Put sizes,
> > > etc as well to see if that triggers an idea from the list.
> > > --S
> > >
> > >
> > > On Fri, Oct 12, 2012 at 12:46 PM, Bryan Beaudreault
> > > <[EMAIL PROTECTED]> wrote:
> > > > I recommend turning on debug logging on your region servers.  You may
> > > need
> > > > to tune down certain packages back to info, because there are a few
> > > spammy
> > > > ones, but overall it helps.
> > > >
> > > > You should see messages such as "12/10/09 14:22:57 INFO
> > > > regionserver.HRegion: Blocking updates for 'IPC Server handler 41 on
> > > 60020'
> > > > on region XXX: memstore size 256.0m is >= than blocking 256.0m size".
> >  As
> > > > you can see, this is an INFO anyway so you should be able to see it
> now
> > > if
> > > > it is happening.
> > > >
> > > > You can try upping the number of IPC handlers and the memstore flush
> > > > threshold.  Also, maybe you are bottlenecked by the WAL.  Try doing
> > > > put.setWriteToWAL(false), just to see if it increases performance.
>  If
> > so
> > > > and you want to be a bit more safe with regard to the wal, you can
> try
> > > > turning on deferred flush on your table.  I don't really know how to