Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: High IO Usage in Datanodes due to Replication


Copy link to this message
-
Re: High IO Usage in Datanodes due to Replication
Thanks Harsh & Manoj for the inputs.

Now i found that the data node is busy with block scanning. I have TBs data
attached with each data node. So its taking days to complete the data block
scanning. I have two questions.

1. Is data node will not allow to write the data during DataBlockScanning
process ?

2. Is data node will come normal only when "Not yet verified" come to zero
in data node blockScannerReport ?

# Data node logs

2013-05-01 05:53:50,639 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_-7605405041820244736_20626608
2013-05-01 05:53:50,664 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_-1425088964531225881_20391711
2013-05-01 05:53:50,692 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_2259194263704433881_10277076
2013-05-01 05:53:50,740 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_2653195657740262633_18315696
2013-05-01 05:53:50,818 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_-5124560783595402637_20821252
2013-05-01 05:53:50,866 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_6596021414426970798_19649117
2013-05-01 05:53:50,931 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_7026400040099637841_20741138
2013-05-01 05:53:50,992 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_8535358360851622516_20694185
2013-05-01 05:53:51,057 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_7959856580255809601_20559830

# One of my Data node block scanning report

http://<datanode-host>:15075/blockScannerReport

Total Blocks                 : 2037907
Verified in last hour        :   4819
Verified in last day         : 107355
Verified in last week        : 686873
Verified in last four weeks  : 1589964
Verified in SCAN_PERIOD      : 1474221
Not yet verified             : 447943
Verified since restart       : 318433
Scans since restart          : 318058
Scan errors since restart    :      0
Transient scan errors        :      0
Current scan rate limit KBps :   3205
Progress this period         :    101%
Time left in cur period      :  86.02%

Thanks
Selva
-----Original Message-----
>From "S, Manoj" <[EMAIL PROTECTED]>
Subject RE: High IO Usage in Datanodes due to Replication
Date Mon, 29 Apr 2013 06:41:31 GMT
Adding to Harsh's comments:

You can also tweak a few OS level parameters to improve the I/O performance.
1) Mount the filesystem with "noatime" option.
2) Check if changing the IO scheduling the algorithm will improve the
cluster's performance.
(Check this file /sys/block/<device_name>/queue/scheduler)
3) If there are lots of I/O requests and your cluster hangs because of
that, you can increase
the queue length by increasing the value in
/sys/block/<device_name>/queue/nr_requests.

-----Original Message-----
From: Harsh J [mailto:[EMAIL PROTECTED]]
Sent: Sunday, April 28, 2013 12:03 AM
To: <[EMAIL PROTECTED]>
Subject: Re: High IO Usage in Datanodes due to Replication

They seem to be transferring blocks between one another. This may most
likely be due to under-replication
and the NN UI will have numbers on work left to perform. The inter-DN
transfer is controlled
by the balancing bandwidth though, so you can lower that down if you want
to, to cripple it
- but you'll lose out on time for a perfectly replicated state again.

On Sat, Apr 27, 2013 at 11:33 PM, selva <[EMAIL PROTECTED]> wrote:
> Hi All,
>
> I have lost amazon instances of my hadoop cluster. But i had all the
> data in aws EBS volumes. So i launched new instances and attached volumes.
>
> But all of the datanode logs keep on print the below lines it cauased
> to high IO rate. Due to IO usage i am not able to run any jobs.
>
> Can anyone help me to understand what it is doing? Thanks in advance.

Harsh J