Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Re: High IO Usage in Datanodes due to Replication


+
selva 2013-05-01, 06:32
+
Harsh J 2013-05-01, 08:55
Copy link to this message
-
Re: High IO Usage in Datanodes due to Replication
selva 2013-05-01, 10:09
Hi Harsh,

You are right, Our Hadoop version is "0.20.2-cdh3u1" which is lack of
HDFS-2379.

As you suggest i have doubled the DN heap size, Now i will monitor the
Block scanning speed.

The 2nd idea is good, but I can not merge the small files(~1 MB) since its
all in hive table partitions.

-Selva
On Wed, May 1, 2013 at 2:25 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Neither block reports nor block scanning should affect general DN I/O,
> although the former may affect DN liveliness in older versions, if
> they lack HDFS-2379 in them. Brahma is partially right in having
> mentioned the block reports, hence.
>
> Your solution, if the # of blocks per DN is too high (counts available
> on Live Nodes page in NN UI), say > 1m or so blocks, is to simply
> raise the DN heap by another GB to fix issues immediately, and then
> start working on merging small files together for more efficient
> processing and reducing overall block count to lower memory pressure
> at the DNs.
>
>
>
> On Wed, May 1, 2013 at 12:02 PM, selva <[EMAIL PROTECTED]> wrote:
> > Thanks a lot Harsh. Your input is really valuable for me.
> >
> > As you mentioned above, we have overload of many small files in our
> cluster.
> >
> > Also when i load data huge data to hive tables, It throws an exception
> like
> > "replicated to to 0 nodes instead of 1". When i google it i found one of
> the
> > reason matches my case  "Data Node is Busy with block report and block
> > scanning" @ http://bit.ly/ZToyNi
> >
> > Is increasing the Block scanning and scanning all inefficient small files
> > will fix my problem ?
> >
> > Thanks
> > Selva
> >
> >
> > On Wed, May 1, 2013 at 11:37 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> >>
> >> The block scanner is a simple, independent operation of the DN that
> >> runs periodically and does work in small phases, to ensure that no
> >> blocks exist that aren't matching their checksums (its an automatic
> >> data validator) - such that it may report corrupt/rotting blocks and
> >> keep the cluster healthy.
> >>
> >> Its runtime shouldn't cause any issues, unless your DN has a lot of
> >> blocks (more than normal due to overload of small, inefficient files)
> >> but too little heap size to perform retention plus block scanning.
> >>
> >> > 1. Is data node will not allow to write the data during
> >> > DataBlockScanning process ?
> >>
> >> No such thing. As I said, its independent and mostly lock free. Writes
> >> or reads are not hampered.
> >>
> >> > 2. Is data node will come normal only when "Not yet verified" come to
> >> > zero in data node blockScannerReport ?
> >>
> >> Yes, but note that this runs over and over again (once every 3 weeks
> >> IIRC).
> >>
> >> On Wed, May 1, 2013 at 11:33 AM, selva <[EMAIL PROTECTED]> wrote:
> >> > Thanks Harsh & Manoj for the inputs.
> >> >
> >> > Now i found that the data node is busy with block scanning. I have TBs
> >> > data
> >> > attached with each data node. So its taking days to complete the data
> >> > block
> >> > scanning. I have two questions.
> >> >
> >> > 1. Is data node will not allow to write the data during
> >> > DataBlockScanning
> >> > process ?
> >> >
> >> > 2. Is data node will come normal only when "Not yet verified" come to
> >> > zero
> >> > in data node blockScannerReport ?
> >> >
> >> > # Data node logs
> >> >
> >> > 2013-05-01 05:53:50,639 INFO
> >> > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> >> > succeeded for blk_-7605405041820244736_20626608
> >> > 2013-05-01 05:53:50,664 INFO
> >> > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> >> > succeeded for blk_-1425088964531225881_20391711
> >> > 2013-05-01 05:53:50,692 INFO
> >> > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> >> > succeeded for blk_2259194263704433881_10277076
> >> > 2013-05-01 05:53:50,740 INFO
> >> > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> >> > succeeded for blk_2653195657740262633_18315696

+
selva 2013-05-01, 06:03