Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Re: debugging responseTooSlow


Copy link to this message
-
Re: debugging responseTooSlow
Viral:
Did you use YCSB or LoadTestTool ?

Was the load spread relatively evenly across your servers ?

Thanks

On Fri, Feb 15, 2013 at 9:19 PM, Viral Bajaria <[EMAIL PROTECTED]>wrote:

> Yeah I noticed very high latency around the time of slow response,
> basically my client timed out for those requests. I have pre-split the
> table into 128 regions. Unfortunately I didn't have ganglia installed, I
> will install ganglia on those boxes and run the perf again and post the
> results.
>
> Regarding the I/O wait, the timeout only happened on one box or that's what
> I saw in the logs. When I run the test again with ganglia on, I will verify
> if it only happens on one node.
>
> Thanks,
> Viral
>
> On Fri, Feb 15, 2013 at 8:09 PM, Kevin O'dell <[EMAIL PROTECTED]
> >wrote:
>
> > If you take a look at sar from 2013-02-16 on
> > 10.149.10.10<http://10.149.10.10:41017/> do
> > you see any major I/O wait, swapping, or anything out of the norm?  Is
> this
> > occurring on all three region servers?  When the perf test is running can
> > you verify you are writing to all three nodes?
> >
> > On Fri, Feb 15, 2013 at 11:03 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >
> > > The slow response took about 1.5 minutes. During this period, did you
> > > observe high latency ?
> > >
> > > If you have Ganglia installed on master / NN node, do you observe
> > abnormal
> > > spike ?
> > >
> > > BTW did you presplit your table ?
> > >
> > > Thanks
> > >
> > > On Fri, Feb 15, 2013 at 7:14 PM, Viral Bajaria <
> [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > Hi,
> > > >
> > > > (using hbase 0.94.4 and hadoop 1.0.4)
> > > >
> > > > I have been seeing a lot of the following WARN in my logs:
> > > >
> > > > 2013-02-16 02:37:11,409 DEBUG
> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=25.18
> MB,
> > > > free=2.97 GB, max=3 GB, blocks=1, accesses=52, hits=51,
> > hitRatio=98.07%,
> > > ,
> > > > cachingAccesses=52, cachingHits=51, cachingHitsRatio=98.07%, ,
> > > evictions=0,
> > > > evicted=0, evictedPerRun=NaN
> > > > 2013-02-16 02:37:33,368 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (responseTooSlow):
> > > >
> > > >
> > >
> >
> {"processingtimems":97509,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@1c3308bd
> > > > ),
> > > > rpc version=1, client version=29,
> > > > methodsFingerPrint=-1368823753","client":"
> > > > 10.149.10.10:41009
> > > >
> > > >
> > >
> >
> ","starttimems":1360982155855,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
> > > > 2013-02-16 02:38:37,377 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (responseTooSlow):
> > > >
> > > >
> > >
> >
> {"processingtimems":97191,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@3eafc7ae
> > > > ),
> > > > rpc version=1, client version=29,
> > > > methodsFingerPrint=-1368823753","client":"
> > > > 10.149.10.10:41014
> > > >
> > > >
> > >
> >
> ","starttimems":1360982220183,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
> > > > 2013-02-16 02:39:29,842 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (responseTooSlow):
> > > >
> > > >
> > >
> >
> {"processingtimems":85300,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@3d615428
> > > > ),
> > > > rpc version=1, client version=29,
> > > > methodsFingerPrint=-1368823753","client":"
> > > > 10.149.10.10:41017
> > > >
> > > >
> > >
> >
> ","starttimems":1360982284538,"queuetimems":1,"class":"HRegionServer","responsesize":0,"method":"multi"}
> > > >
> > > > It's strange because this is a new hbase setup with almost no traffic
> > on
> > > > it. I am running a perf test and would not expect this to happen. The
> > > > regionservers have 12GB heap space and are only using 1GB when that
> > error
> > > > happens. I just pushed close to 33K rows via a batch and I see the
> > > > responseTooSlow.
> > > >
> > > > I enabled GC logging, but I don't see any GC lockups, and each GC
> > attempt
> > > > is only taking a few 100 ms.
> > > >
> > > > What else could be happening here, any pointers on debugging ? My