Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - 0.92 and Read/writes not scaling


Copy link to this message
-
Re: 0.92 and Read/writes not scaling
Matt Corgan 2012-03-19, 16:55
I'd be curious to see what happens if you split the table into 1 region per
CPU core, so 24 cores * 11 servers = 264 regions.  Each region has 1
memstore which is a ConcurrentSkipListMap, and you're currently hitting
each CSLM with 8 cores which might be too contentious.  Normally in
production you would want multiple memstores per CPU core.
On Mon, Mar 19, 2012 at 5:31 AM, Juhani Connolly <[EMAIL PROTECTED]> wrote:

> Actually we did try running off two machines both running our own
> tests in parallel. Unfortunately the results were a split that results
> in the same total throughput. We also did the same thing with iperf
> running from each machine to another machine, indicating 800Mb
> additional throughput between each pair of machines.
> However we didn't try these tests very thoroughly so I will revisit
> them as soon as I get back to the office, thanks.
>
> On Mon, Mar 19, 2012 at 9:21 PM, Christian Schäfer <[EMAIL PROTECTED]>
> wrote:
> > referring to my experiences I expect the client to be the bottleneck,
> too.
> >
> > So try to increase the count of client-machines (not client threads)
> each with its own unshared network interface.
> >
> > In my case I could double write throughput by doubling client machine
> count with a much smaller system than yours (5 machines, 4gigs RAM each).
> >
> > Good Luck
> > Chris
> >
> >
> >
> > ________________________________
> >  Von: Juhani Connolly <[EMAIL PROTECTED]>
> > An: [EMAIL PROTECTED]
> > Gesendet: 13:02 Montag, 19.März 2012
> > Betreff: Re: 0.92 and Read/writes not scaling
> >
> > I was concerned that may be the case too, which is why we ran the ycsb
> > tests in addition to our application specific and general performance
> > tests. checking profiles of the execution just showed the vast majority
> of
> > time spent waiting for responses. these were all run with 400
> > threads(though we tried more/less just in case)
> > 2012/03/19 20:57 "Mingjian Deng" <[EMAIL PROTECTED]>:
> >
> >> @Juhani:
> >> How many clients did you test? Maybe the bottleneck was client?
> >>
> >> 2012/3/19 Ramkrishna.S.Vasudevan <[EMAIL PROTECTED]>
> >>
> >> > Hi Juhani
> >> >
> >> > Can you tell more on how the regions are balanced?
> >> > Are you overloading only specific region server alone?
> >> >
> >> > Regards
> >> > Ram
> >> >
> >> > > -----Original Message-----
> >> > > From: Juhani Connolly [mailto:[EMAIL PROTECTED]]
> >> > > Sent: Monday, March 19, 2012 4:11 PM
> >> > > To: [EMAIL PROTECTED]
> >> > > Subject: 0.92 and Read/writes not scaling
> >> > >
> >> > > Hi,
> >> > >
> >> > > We're running into a brick wall where our throughput numbers will
> not
> >> > > scale as we increase server counts both using custom inhouse tests
> and
> >> > > ycsb.
> >> > >
> >> > > We're using hbase 0.92 on hadoop 0.20.2(we also experience the same
> >> > > issues using 0.90 before switching our testing to  this version).
> >> > >
> >> > > Our cluster consists of:
> >> > > - Namenode and hmaster on separate servers, 24 core, 64gb
> >> > > - up to 11 datanode/regionservers. 24 core, 64gb, 4 * 1tb disks(hope
> >> > > to get this changed)
> >> > >
> >> > > We have adjusted our gc settings, and mslabs:
> >> > >
> >> > >   <property>
> >> > >     <name>hbase.hregion.memstore.mslab.enabled</name>
> >> > >     <value>true</value>
> >> > >   </property>
> >> > >
> >> > >   <property>
> >> > >     <name>hbase.hregion.memstore.mslab.chunksize</name>
> >> > >     <value>2097152</value>
> >> > >   </property>
> >> > >
> >> > >   <property>
> >> > >     <name>hbase.hregion.memstore.mslab.max.allocation</name>
> >> > >     <value>1024768</value>
> >> > >   </property>
> >> > >
> >> > > hdfs xceivers is set to 8192
> >> > >
> >> > > We've experimented with a variety of handler counts for namenode,
> >> > > datanodes and regionservers with no changes in throughput.
> >> > >
> >> > > For testing with ycsb, we do the following each time(with nothing
> else
> >> > > using the cluster):
> >> > > - truncate test table