Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Cluster Wide Pauses


Copy link to this message
-
RE: Cluster Wide Pauses
These are a different kind of pause (those caused by blockingStoreFiles).

This is HBase stepping in and actually blocking updates to a region because compactions have not been able to keep up with the write load.  It could manifest itself in the same way but this is different than shorter pauses caused by periodic offlining of regions during balancing and splits.

Wayne, have you confirmed in your RegionServer logs that the pauses are associated with splits or region movement, and that you are not seeing the blocking store files issue?

JG

> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Christopher
> Tarnas
> Sent: Friday, January 14, 2011 7:29 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Cluster Wide Pauses
>
> I have been seeing similar problems and found by raising the
> hbase.hregion.memstore.block.multiplier
> to above 12 (default is two) and the hbase.hstore.blockingStoreFiles to 16 I
> managed to reduce the frequency of the pauses during loads.  My nodes are
> pretty beefy (48 GB of ram) so I had room to experiment.
>
> From what I understand that gave the regionservers more buffer before
> they had to halt the world to catch up. The pauses still happen but their
> impact is less now.
>
> -chris
>
> On Fri, Jan 14, 2011 at 8:34 AM, Wayne <[EMAIL PROTECTED]> wrote:
>
> > We have not found any smoking gun here. Most likely these are region
> > splits on a quickly growing/hot region that all clients get caught waiting for.
> >
> >
> > On Thu, Jan 13, 2011 at 7:49 AM, Wayne <[EMAIL PROTECTED]> wrote:
> >
> > > Thank you for the lead! We will definitely look closer at the OS logs.
> > >
> > >
> > > On Thu, Jan 13, 2011 at 6:59 AM, Tatsuya Kawano
> > ><[EMAIL PROTECTED]
> > >wrote:
> > >
> > >>
> > >> Hi Wayne,
> > >>
> > >> > We are seeing some TCP Resets on all nodes at the same time, and
> > >> sometimes
> > >> > quite a lot of them.
> > >>
> > >>
> > >> Have you checked this article from Andrei and Cosmin? They had a
> > >> busy firewall to cause network blackout.
> > >>
> > >> http://hstack.org/hbase-performance-testing/
> > >>
> > >> Maybe it's not your case but just for sure.
> > >>
> > >> Thanks,
> > >>
> > >> --
> > >> Tatsuya Kawano (Mr.)
> > >> Tokyo, Japan
> > >>
> > >>
> > >> On Jan 13, 2011, at 4:52 AM, Wayne <[EMAIL PROTECTED]> wrote:
> > >>
> > >> > We are seeing some TCP Resets on all nodes at the same time, and
> > >> sometimes
> > >> > quite a lot of them. We have yet to correlate the pauses to the
> > >> > TCP
> > >> resets
> > >> > but I am starting to wonder if this is partly a network problem.
> > >> > Does Gigabit Ethernet break down on high volume nodes? Do high
> > >> > volume nodes
> > >> use
> > >> > 10G or Infiniband?
> > >> >
> > >> >
> > >> > On Wed, Jan 12, 2011 at 1:52 PM, Stack <[EMAIL PROTECTED]> wrote:
> > >> >
> > >> >> Jon asks that you describe your loading in the issue.  Would you
> > >> >> mind doing so.  Ted, stick up in the issue the workload and
> > >> >> configs. you are running if you don't mind.  I'd like to try it over here.
> > >> >> Thanks lads,
> > >> >> St.Ack
> > >> >>
> > >> >>
> > >> >> On Wed, Jan 12, 2011 at 9:03 AM, Wayne <[EMAIL PROTECTED]>
> wrote:
> > >> >>> Added: https://issues.apache.org/jira/browse/HBASE-3438.
> > >> >>>
> > >> >>> On Wed, Jan 12, 2011 at 11:40 AM, Wayne <[EMAIL PROTECTED]>
> wrote:
> > >> >>>
> > >> >>>> We are using 0.89.20100924, r1001068
> > >> >>>>
> > >> >>>> We are seeing see it during heavy write load (which is all the
> > time),
> > >> >> but
> > >> >>>> yesterday we had read load as well as write load and saw both
> > >> >>>> reads
> > >> and
> > >> >>>> writes stop for 10+ seconds. The region size is the biggest
> > >> >>>> clue we
> > >> have
> > >> >>>> found from our tests as setting up a new cluster with a 1GB
> > >> >>>> max
> > >> region
> > >> >> size
> > >> >>>> and starting to load heavily we will see this a lot for long
> > >> >>>> long
> > >> time
> > >> >>>> frames. Maybe the bigger file gets hung up more easily with a
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB