Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> 0.92 and Read/writes not scaling


Copy link to this message
-
Re: 0.92 and Read/writes not scaling
When you increased regions on your previous test, did it start maxing out
CPU?  What improvement did you see?

Have you tried increasing the memstore flush size to something like 512MB?
 Maybe you're blocked on flushes.  40,000 (4,000/server) is pretty slow for
a disabled WAL i think, especially with batch size of 10.  If you increase
write batch size to 1000 how much does your write throughput increase?
On Fri, Mar 23, 2012 at 3:48 AM, Juhani Connolly <[EMAIL PROTECTED]> wrote:

> Also, the latency on requests is extremely long. If we group them into
> sets of 10 puts(128-256 bytes each) before flushing the client table,
> latency is over 1 second.
>
> We get entries like this in our logs:
> 22:17:51,010 WARN org.apache.hadoop.ipc.HBaseServer:
> (responseTooSlow):
>
> {"processingtimems":16692,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@65312e3b
> ),
> rpc version=1, client version=29,
> methodsFingerPrint=54742778","client":"10.172.109.3:42725
> ","starttimems":1332335854317,"queuetimems":6387,"class":"HRegionServer","responsesize":0,"method":"multi"}
>
> Any suggestions as to where we should be digging?
>
> On Fri, Mar 23, 2012 at 4:40 PM, Juhani Connolly <[EMAIL PROTECTED]>
> wrote:
> > Status update:
> >
> > - We moved to cdh 4b1, so hbase 0.92 and hdfs 0.23(until now we were
> > using 0.20.2 series)
> > - Did the tests now with 256/512 regions, the numbers do appear to
> > scale which is good.
> >
> > BUT, our write throughput has gone in the dump. If we disable wal
> > writes, we still get nearly 40,000 a second, but with it on, we're
> > lucky to get more than 12,000. Before we were getting as high as
> > 70,000 grouping puts together. Have set up log collection, and am not
> > finding anything unusual in the logs.
> >
> > Mikael: One of the tests is the ycsb one where we just let it choose
> > the size. Our own custom test has a configurable size, but we have
> > been testing with entries that are 128-256 bytes per entry, as this is
> > what we expect in our application. What exactly should we be looking
> > at with the storefiles?
> >
> > On Wed, Mar 21, 2012 at 2:29 PM, Mikael Sitruk <[EMAIL PROTECTED]>
> wrote:
> >> Juhani,
> >> Can you look at the storefiles and tell how they behave during the test?
> >> What is the size of the data you insert/update?
> >> Mikael
> >> On Mar 20, 2012 8:10 PM, "Juhani Connolly" <[EMAIL PROTECTED]> wrote:
> >>
> >>> Hi Matt,
> >>>
> >>> this is something we haven't tested much, we were always running with
> >>> about 32 regions which gave enough coverage for an even spread over
> >>> all machines.
> >>> I will run our tests with enough regions per server to cover all cores
> >>> and get back to the ml
> >>>
> >>> On Tue, Mar 20, 2012 at 1:55 AM, Matt Corgan <[EMAIL PROTECTED]>
> wrote:
> >>> > I'd be curious to see what happens if you split the table into 1
> region
> >>> per
> >>> > CPU core, so 24 cores * 11 servers = 264 regions.  Each region has 1
> >>> > memstore which is a ConcurrentSkipListMap, and you're currently
> hitting
> >>> > each CSLM with 8 cores which might be too contentious.  Normally in
> >>> > production you would want multiple memstores per CPU core.
> >>> >
> >>> >
> >>> > On Mon, Mar 19, 2012 at 5:31 AM, Juhani Connolly <[EMAIL PROTECTED]>
> >>> wrote:
> >>> >
> >>> >> Actually we did try running off two machines both running our own
> >>> >> tests in parallel. Unfortunately the results were a split that
> results
> >>> >> in the same total throughput. We also did the same thing with iperf
> >>> >> running from each machine to another machine, indicating 800Mb
> >>> >> additional throughput between each pair of machines.
> >>> >> However we didn't try these tests very thoroughly so I will revisit
> >>> >> them as soon as I get back to the office, thanks.
> >>> >>
> >>> >> On Mon, Mar 19, 2012 at 9:21 PM, Christian Schäfer <
> >>> [EMAIL PROTECTED]>
> >>> >> wrote:
> >>> >> > referring to my experiences I expect the client to be the
> bottleneck,