Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: High IO Usage in Datanodes due to Replication


+
selva 2013-05-01, 06:32
+
Harsh J 2013-05-01, 08:55
+
selva 2013-05-01, 10:09
Copy link to this message
-
Re: High IO Usage in Datanodes due to Replication
Thanks Harsh & Manoj for the inputs.

Now i found that the data node is busy with block scanning. I have TBs data
attached with each data node. So its taking days to complete the data block
scanning. I have two questions.

1. Is data node will not allow to write the data during DataBlockScanning
process ?

2. Is data node will come normal only when "Not yet verified" come to zero
in data node blockScannerReport ?

# Data node logs

2013-05-01 05:53:50,639 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_-7605405041820244736_20626608
2013-05-01 05:53:50,664 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_-1425088964531225881_20391711
2013-05-01 05:53:50,692 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_2259194263704433881_10277076
2013-05-01 05:53:50,740 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_2653195657740262633_18315696
2013-05-01 05:53:50,818 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_-5124560783595402637_20821252
2013-05-01 05:53:50,866 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_6596021414426970798_19649117
2013-05-01 05:53:50,931 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_7026400040099637841_20741138
2013-05-01 05:53:50,992 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_8535358360851622516_20694185
2013-05-01 05:53:51,057 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_7959856580255809601_20559830

# One of my Data node block scanning report

http://<datanode-host>:15075/blockScannerReport

Total Blocks                 : 2037907
Verified in last hour        :   4819
Verified in last day         : 107355
Verified in last week        : 686873
Verified in last four weeks  : 1589964
Verified in SCAN_PERIOD      : 1474221
Not yet verified             : 447943
Verified since restart       : 318433
Scans since restart          : 318058
Scan errors since restart    :      0
Transient scan errors        :      0
Current scan rate limit KBps :   3205
Progress this period         :    101%
Time left in cur period      :  86.02%

Thanks
Selva
-----Original Message-----
>From "S, Manoj" <[EMAIL PROTECTED]>
Subject RE: High IO Usage in Datanodes due to Replication
Date Mon, 29 Apr 2013 06:41:31 GMT
Adding to Harsh's comments:

You can also tweak a few OS level parameters to improve the I/O performance.
1) Mount the filesystem with "noatime" option.
2) Check if changing the IO scheduling the algorithm will improve the
cluster's performance.
(Check this file /sys/block/<device_name>/queue/scheduler)
3) If there are lots of I/O requests and your cluster hangs because of
that, you can increase
the queue length by increasing the value in
/sys/block/<device_name>/queue/nr_requests.

-----Original Message-----
From: Harsh J [mailto:[EMAIL PROTECTED]]
Sent: Sunday, April 28, 2013 12:03 AM
To: <[EMAIL PROTECTED]>
Subject: Re: High IO Usage in Datanodes due to Replication

They seem to be transferring blocks between one another. This may most
likely be due to under-replication
and the NN UI will have numbers on work left to perform. The inter-DN
transfer is controlled
by the balancing bandwidth though, so you can lower that down if you want
to, to cripple it
- but you'll lose out on time for a perfectly replicated state again.

On Sat, Apr 27, 2013 at 11:33 PM, selva <[EMAIL PROTECTED]> wrote:
> Hi All,
>
> I have lost amazon instances of my hadoop cluster. But i had all the
> data in aws EBS volumes. So i launched new instances and attached volumes.
>
> But all of the datanode logs keep on print the below lines it cauased
> to high IO rate. Due to IO usage i am not able to run any jobs.
>
> Can anyone help me to understand what it is doing? Thanks in advance.

Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB