Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> RegionServer dying every two or three days


Copy link to this message
-
Re: RegionServer dying every two or three days
You could always try going with a little smaller heap and see how it works
for your particular workload, maybe 4G.  1G block cache, 1G memstores, ~1G
GC overhead(?), leaving 1G for active program data.

If trying to squeeze memory, you should be aware there is a limitation in
0.90 where storefile indexes come out of that remaining 1G as opposed to
being stored in the block cache.  If you have big indexes, you would need
to shrink block cache and memstore limits to compensate.
http://search-hadoop.com/m/OH4cT1LiN4Q1/corgan&subj=Re+a+question+storefileIndexSize
On Mon, Jan 23, 2012 at 4:32 AM, Leonardo Gamas
<[EMAIL PROTECTED]>wrote:

> Thanks again Matt! I will try out this instance type, but i'm concerned
> about the MapReduce cluster running apart from HBase in my case, since we
> have some MapReduces running and planning to run more. Feels like losing
> the great strength of MapReduce, by running it far from data.
>
> 2012/1/21 Matt Corgan <[EMAIL PROTECTED]>
>
> > We actually don't run map/reduce on the same machines (most of our jobs
> are
> > on an old message based system), so don't have much experience there.  We
> > run only HDFS (1G heap) and HBase (5.5G heap) with 12 * 100GB EBS volumes
> > per regionserver, and ~350 regions/server at the moment.  5.5G is
> already a
> > small heap in the hbase world, so I wouldn't recommend decreasing it to
> fit
> > M/R,  You could always run map/reduce on separate servers, adding or
> > removing servers as needed (more at night?), or use Amazon's Elastic M/R.
> >
> >
> > On Sat, Jan 21, 2012 at 5:04 AM, Leonardo Gamas
> > <[EMAIL PROTECTED]>wrote:
> >
> > > Thanks Matt for this insightful article, I will run my cluster with
> > > c1.xlarge to test it's performance. But i'm concerned with this
> machine,
> > > because the amount of RAM available, only 7GB. How many map/reduce
> slots
> > do
> > > you configure? And the amount of Heap for HBase? How many regions per
> > > RegionServer could my cluster support?
> > >
> > > 2012/1/20 Matt Corgan <[EMAIL PROTECTED]>
> > >
> > > > I run c1.xlarge servers and have found them very stable.  I see 100
> > > Mbit/s
> > > > sustained bi-directional network throughput (200Mbit/s total),
> > sometimes
> > > up
> > > > to 150 * 2 Mbit/s.
> > > >
> > > > Here's a pretty thorough examination of the underlying hardware:
> > > >
> > > >
> > > >
> > >
> >
> http://huanliu.wordpress.com/2010/06/14/amazons-physical-hardware-and-ec2-compute-unit/
> > > >
> > > >
> > > > *High-CPU instances*
> > > >
> > > > The high-CPU instances (c1.medium, c1.xlarge) run on systems with
> > > > dual-socket Intel Xeon E5410 2.33GHz processors. It is dual-socket
> > > because
> > > > we see APIC IDs 0 to 7, and E5410 only has 4 cores. A c1.xlarge
> > instance
> > > > almost takes up the whole physical machine. However, we frequently
> > > observe
> > > > steal cycle on a c1.xlarge instance ranging from 0% to 25% with an
> > > average
> > > > of about 10%. The amount of steal cycle is not enough to host another
> > > > smaller VM, i.e., a c1.medium. Maybe those steal cycles are used to
> run
> > > > Amazon’s software firewall (security group). On Passmark-CPU mark, a
> > > > c1.xlarge machine achieves 7,962.6, actually higher than an average
> > > > dual-sock E5410 system is able to achieve (average is 6,903).
> > > >
> > > >
> > > >
> > > > On Fri, Jan 20, 2012 at 8:03 AM, Leonardo Gamas
> > > > <[EMAIL PROTECTED]>wrote:
> > > >
> > > > > Thanks Neil for sharing your experience with AWS! Could you tell
> what
> > > > > instance type are you using?
> > > > > We are using m1.xlarge, that has 4 virtual cores, but i normally
> see
> > > > > recommendations for machines with 8 cores like c1.xlarge,
> m2.4xlarge,
> > > > etc.
> > > > > In principle these 8-core machines don't suffer too much with I/O
> > > > problems
> > > > > since they don't share the physical server. Is there any piece of
> > > > > information from Amazon or other source that affirms that or it's