Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: High IO Usage in Datanodes due to Replication


Copy link to this message
-
Re: High IO Usage in Datanodes due to Replication
Thanks a lot Harsh. Your input is really valuable for me.

As you mentioned above, we have overload of many small files in our
cluster.

Also when i load data huge data to hive tables, It throws an exception like
"replicated to to 0 nodes instead of 1". When i google it i found one of
the reason matches my case  "Data Node is Busy with block report and block
scanning" @ http://bit.ly/ZToyNi

Is increasing the Block scanning and scanning all inefficient small files
will fix my problem ?

Thanks
Selva
On Wed, May 1, 2013 at 11:37 AM, Harsh J <[EMAIL PROTECTED]> wrote:

> The block scanner is a simple, independent operation of the DN that
> runs periodically and does work in small phases, to ensure that no
> blocks exist that aren't matching their checksums (its an automatic
> data validator) - such that it may report corrupt/rotting blocks and
> keep the cluster healthy.
>
> Its runtime shouldn't cause any issues, unless your DN has a lot of
> blocks (more than normal due to overload of small, inefficient files)
> but too little heap size to perform retention plus block scanning.
>
> > 1. Is data node will not allow to write the data during
> DataBlockScanning process ?
>
> No such thing. As I said, its independent and mostly lock free. Writes
> or reads are not hampered.
>
> > 2. Is data node will come normal only when "Not yet verified" come to
> zero in data node blockScannerReport ?
>
> Yes, but note that this runs over and over again (once every 3 weeks IIRC).
>
> On Wed, May 1, 2013 at 11:33 AM, selva <[EMAIL PROTECTED]> wrote:
> > Thanks Harsh & Manoj for the inputs.
> >
> > Now i found that the data node is busy with block scanning. I have TBs
> data
> > attached with each data node. So its taking days to complete the data
> block
> > scanning. I have two questions.
> >
> > 1. Is data node will not allow to write the data during DataBlockScanning
> > process ?
> >
> > 2. Is data node will come normal only when "Not yet verified" come to
> zero
> > in data node blockScannerReport ?
> >
> > # Data node logs
> >
> > 2013-05-01 05:53:50,639 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> > succeeded for blk_-7605405041820244736_20626608
> > 2013-05-01 05:53:50,664 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> > succeeded for blk_-1425088964531225881_20391711
> > 2013-05-01 05:53:50,692 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> > succeeded for blk_2259194263704433881_10277076
> > 2013-05-01 05:53:50,740 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> > succeeded for blk_2653195657740262633_18315696
> > 2013-05-01 05:53:50,818 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> > succeeded for blk_-5124560783595402637_20821252
> > 2013-05-01 05:53:50,866 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> > succeeded for blk_6596021414426970798_19649117
> > 2013-05-01 05:53:50,931 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> > succeeded for blk_7026400040099637841_20741138
> > 2013-05-01 05:53:50,992 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> > succeeded for blk_8535358360851622516_20694185
> > 2013-05-01 05:53:51,057 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> > succeeded for blk_7959856580255809601_20559830
> >
> > # One of my Data node block scanning report
> >
> > http://<datanode-host>:15075/blockScannerReport
> >
> > Total Blocks                 : 2037907
> > Verified in last hour        :   4819
> > Verified in last day         : 107355
> > Verified in last week        : 686873
> > Verified in last four weeks  : 1589964
> > Verified in SCAN_PERIOD      : 1474221
> > Not yet verified             : 447943
> > Verified since restart       : 318433
> > Scans since restart          : 318058
> > Scan errors since restart    :      0

+
Harsh J 2013-05-01, 08:55
+
selva 2013-05-01, 10:09
+
selva 2013-05-01, 06:03