Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - HBase 0.94.15: writes stalls periodically even under moderate steady load (AWS EC2)


Copy link to this message
-
Re: 答复: 答复: HBase 0.94.15: writes stalls periodically even under moderate steady load (AWS EC2)
lars hofhansl 2014-01-17, 16:49
One other question: Did you give the RS the default 1000mb heap?
For real work you have to increase that. Maybe try with 4000 or 8000 on those boxes.

(In any case that will not solve any problem with unreachable or unavailable data nodes)
-- Lars

----- Original Message -----
From: Vladimir Rodionov <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Cc: lars hofhansl <[EMAIL PROTECTED]>
Sent: Thursday, January 16, 2014 6:46 AM
Subject: Re: 答复: 答复: HBase 0.94.15: writes stalls periodically even under moderate steady load (AWS EC2)

1.0.4 (default).
On Thu, Jan 16, 2014 at 12:24 AM, 谢良 <[EMAIL PROTECTED]> wrote:

> Just curious, what's your hadoop version, Vladimir ?
> At least on hadoop2.0+, the default ReplcaceDatanode policy should be
> expected pick another dn up to setupPipeline, then if you have only 1 dn
> broken, it should be expected still could write into 3 nodes successful,
> and then the HBase's "hbase.regionserver.hlog.tolerable.lowreplication"
> checking will not jump out:)
>
> Thanks,
> ________________________________________
> 发件人: Vladimir Rodionov [[EMAIL PROTECTED]]
> 发送时间: 2014年1月16日 14:45
> 收件人: [EMAIL PROTECTED]
> 抄送: lars hofhansl
> 主题: Re: 答复: HBase 0.94.15: writes stalls periodically even under moderate
> steady load (AWS EC2)
>
> This what I found in a RS Log:
> 2014-01-16 01:22:18,256 ResponseProcessor for block
> blk_5619307008368309102_2603 WARN  [DFSClient] DFSOutputStream
> ResponseProcessor exception  for block
> blk_5619307008368309102_2603java.io.IOException: Bad response 1 for block
> blk_5619307008368309102_2603 from datanode 10.38.106.234:50010
>         at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2977)
>
> 2014-01-16 01:22:18,258 DataStreamer for file
>
> /hbase/.logs/ip-10-10-25-199.ec2.internal,60020,1389843986689/ip-10-10-25-199.ec2.internal%2C60020%2C1389843986689.1389853200626
> WARN  [DFSClient] Error Recovery for block blk_5619307008368309102_2603 bad
> datanode[2] 10.38.106.234:50010
> 2014-01-16 01:22:18,258 DataStreamer for file
>
> /hbase/.logs/ip-10-10-25-199.ec2.internal,60020,1389843986689/ip-10-10-25-199.ec2.internal%2C60020%2C1389843986689.1389853200626
> WARN  [DFSClient] Error Recovery for block blk_5619307008368309102_2603 in
> pipeline 10.10.25.199:50010, 10.40.249.135:50010, 10.38.106.234:50010: bad
> datanode 10.38.106.234:50010
> 2014-01-16 01:22:22,800 IPC Server handler 10 on 60020 WARN  [HLog] HDFS
> pipeline error detected. Found 2 replicas but expecting no less than 3
> replicas.  Requesting close of hlog.
> 2014-01-16 01:22:22,806 IPC Server handler 2 on 60020 WARN  [HLog] HDFS
> pipeline error detected. Found 2 replicas but expecting no less than 3
> replicas.  Requesting close of hlog.
> 2014-01-16 01:22:22,808 IPC Server handler 28 on 60020 WARN  [HLog] HDFS
> pipeline error detected. Found 2 replicas but expecting no less than 3
> replicas.  Requesting close of hlog.
> 2014-01-16 01:22:22,808 IPC Server handler 13 on 60020 WARN  [HLog] HDFS
> pipeline error detected. Found 2 replicas but expecting no less than 3
> replicas.  Requesting close of hlog.
> 2014-01-16 01:22:22,808 IPC Server handler 27 on 60020 WARN  [HLog] HDFS
> pipeline error detected. Found 2 replicas but expecting no less than 3
> replicas.  Requesting close of hlog.
> 2014-01-16 01:22:22,811 IPC Server handler 22 on 60020 WARN  [HLog] Too
> many consecutive RollWriter requests, it's a sign of the total number of
> live datanodes is lower than the tolerable replicas.
> 2014-01-16 01:22:22,911 IPC Server handler 8 on 60020 INFO  [HLog]
> LowReplication-Roller was enabled.
> 2014-01-16 01:22:22,930 regionserver60020.cacheFlusher INFO  [HRegion]
> Finished memstore flush of ~128.3m/134538640, currentsize=3.0m/3113200 for
> region usertable,,1389844429593.d4843a72f02a7396244930162fbecd06. in
> 68096ms, sequenceid=108753, compaction requested=false
> 2014-01-16 01:22:22,930 regionserver60020.logRoller INFO  [FSUtils]