Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> High IO Usage in Datanodes due to Replication


+
selva 2013-04-27, 18:03
+
Harsh J 2013-04-27, 18:33
+
S, Manoj 2013-04-29, 06:41
Copy link to this message
-
Re: High IO Usage in Datanodes due to Replication
The block scanner is a simple, independent operation of the DN that
runs periodically and does work in small phases, to ensure that no
blocks exist that aren't matching their checksums (its an automatic
data validator) - such that it may report corrupt/rotting blocks and
keep the cluster healthy.

Its runtime shouldn't cause any issues, unless your DN has a lot of
blocks (more than normal due to overload of small, inefficient files)
but too little heap size to perform retention plus block scanning.

> 1. Is data node will not allow to write the data during DataBlockScanning process ?

No such thing. As I said, its independent and mostly lock free. Writes
or reads are not hampered.

> 2. Is data node will come normal only when "Not yet verified" come to zero in data node blockScannerReport ?

Yes, but note that this runs over and over again (once every 3 weeks IIRC).

On Wed, May 1, 2013 at 11:33 AM, selva <[EMAIL PROTECTED]> wrote:
> Thanks Harsh & Manoj for the inputs.
>
> Now i found that the data node is busy with block scanning. I have TBs data
> attached with each data node. So its taking days to complete the data block
> scanning. I have two questions.
>
> 1. Is data node will not allow to write the data during DataBlockScanning
> process ?
>
> 2. Is data node will come normal only when "Not yet verified" come to zero
> in data node blockScannerReport ?
>
> # Data node logs
>
> 2013-05-01 05:53:50,639 INFO
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> succeeded for blk_-7605405041820244736_20626608
> 2013-05-01 05:53:50,664 INFO
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> succeeded for blk_-1425088964531225881_20391711
> 2013-05-01 05:53:50,692 INFO
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> succeeded for blk_2259194263704433881_10277076
> 2013-05-01 05:53:50,740 INFO
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> succeeded for blk_2653195657740262633_18315696
> 2013-05-01 05:53:50,818 INFO
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> succeeded for blk_-5124560783595402637_20821252
> 2013-05-01 05:53:50,866 INFO
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> succeeded for blk_6596021414426970798_19649117
> 2013-05-01 05:53:50,931 INFO
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> succeeded for blk_7026400040099637841_20741138
> 2013-05-01 05:53:50,992 INFO
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> succeeded for blk_8535358360851622516_20694185
> 2013-05-01 05:53:51,057 INFO
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> succeeded for blk_7959856580255809601_20559830
>
> # One of my Data node block scanning report
>
> http://<datanode-host>:15075/blockScannerReport
>
> Total Blocks                 : 2037907
> Verified in last hour        :   4819
> Verified in last day         : 107355
> Verified in last week        : 686873
> Verified in last four weeks  : 1589964
> Verified in SCAN_PERIOD      : 1474221
> Not yet verified             : 447943
> Verified since restart       : 318433
> Scans since restart          : 318058
> Scan errors since restart    :      0
> Transient scan errors        :      0
> Current scan rate limit KBps :   3205
> Progress this period         :    101%
> Time left in cur period      :  86.02%
>
> Thanks
> Selva
>
>
> -----Original Message-----
> From "S, Manoj" <[EMAIL PROTECTED]>
> Subject RE: High IO Usage in Datanodes due to Replication
> Date Mon, 29 Apr 2013 06:41:31 GMT
> Adding to Harsh's comments:
>
> You can also tweak a few OS level parameters to improve the I/O performance.
> 1) Mount the filesystem with "noatime" option.
> 2) Check if changing the IO scheduling the algorithm will improve the
> cluster's performance.
> (Check this file /sys/block/<device_name>/queue/scheduler)
> 3) If there are lots of I/O requests and your cluster hangs because of that,

Harsh J
+
Harsh J 2013-05-01, 10:15
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB