Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> RegionServer dying every two or three days


Copy link to this message
-
Re: RegionServer dying every two or three days
We actually don't run map/reduce on the same machines (most of our jobs are
on an old message based system), so don't have much experience there.  We
run only HDFS (1G heap) and HBase (5.5G heap) with 12 * 100GB EBS volumes
per regionserver, and ~350 regions/server at the moment.  5.5G is already a
small heap in the hbase world, so I wouldn't recommend decreasing it to fit
M/R,  You could always run map/reduce on separate servers, adding or
removing servers as needed (more at night?), or use Amazon's Elastic M/R.
On Sat, Jan 21, 2012 at 5:04 AM, Leonardo Gamas
<[EMAIL PROTECTED]>wrote:

> Thanks Matt for this insightful article, I will run my cluster with
> c1.xlarge to test it's performance. But i'm concerned with this machine,
> because the amount of RAM available, only 7GB. How many map/reduce slots do
> you configure? And the amount of Heap for HBase? How many regions per
> RegionServer could my cluster support?
>
> 2012/1/20 Matt Corgan <[EMAIL PROTECTED]>
>
> > I run c1.xlarge servers and have found them very stable.  I see 100
> Mbit/s
> > sustained bi-directional network throughput (200Mbit/s total), sometimes
> up
> > to 150 * 2 Mbit/s.
> >
> > Here's a pretty thorough examination of the underlying hardware:
> >
> >
> >
> http://huanliu.wordpress.com/2010/06/14/amazons-physical-hardware-and-ec2-compute-unit/
> >
> >
> > *High-CPU instances*
> >
> > The high-CPU instances (c1.medium, c1.xlarge) run on systems with
> > dual-socket Intel Xeon E5410 2.33GHz processors. It is dual-socket
> because
> > we see APIC IDs 0 to 7, and E5410 only has 4 cores. A c1.xlarge instance
> > almost takes up the whole physical machine. However, we frequently
> observe
> > steal cycle on a c1.xlarge instance ranging from 0% to 25% with an
> average
> > of about 10%. The amount of steal cycle is not enough to host another
> > smaller VM, i.e., a c1.medium. Maybe those steal cycles are used to run
> > Amazon’s software firewall (security group). On Passmark-CPU mark, a
> > c1.xlarge machine achieves 7,962.6, actually higher than an average
> > dual-sock E5410 system is able to achieve (average is 6,903).
> >
> >
> >
> > On Fri, Jan 20, 2012 at 8:03 AM, Leonardo Gamas
> > <[EMAIL PROTECTED]>wrote:
> >
> > > Thanks Neil for sharing your experience with AWS! Could you tell what
> > > instance type are you using?
> > > We are using m1.xlarge, that has 4 virtual cores, but i normally see
> > > recommendations for machines with 8 cores like c1.xlarge, m2.4xlarge,
> > etc.
> > > In principle these 8-core machines don't suffer too much with I/O
> > problems
> > > since they don't share the physical server. Is there any piece of
> > > information from Amazon or other source that affirms that or it's based
> > in
> > > empirical analysis?
> > >
> > > 2012/1/19 Neil Yalowitz <[EMAIL PROTECTED]>
> > >
> > > > We have experienced many problems with our cluster on EC2.  The blunt
> > > > solution was to increase the Zookeeper timeout to 5 minutes or even
> > more.
> > > >
> > > > Even with a long timeout, however, it's not uncommon for us to see an
> > EC2
> > > > instance to become unresponsive to pings and SSH several times
> during a
> > > > week.  It's been a very bad environment for clusters.
> > > >
> > > >
> > > > Neil
> > > >
> > > > On Thu, Jan 19, 2012 at 11:49 AM, Leonardo Gamas
> > > > <[EMAIL PROTECTED]>wrote:
> > > >
> > > > > Hi Guys,
> > > > >
> > > > > I have tested the parameters provided by Sandy, and it solved the
> GC
> > > > > problems with the -XX:+UseParallelOldGC, thanks for the help Sandy.
> > > > > I'm still experiencing some difficulties, the RegionServer
> continues
> > to
> > > > > shutdown, but it seems related to I/O. It starts to timeout many
> > > > > connections, new connections to/from the machine timeout too, and
> > > finally
> > > > > the RegionServer dies because of YouAreDeadException. I will
> collect
> > > more
> > > > > data, but i think it's an Amazon/Virtualized Environment inherent
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB