Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Hadoop recovery test


+
Artem Ervits 2012-09-17, 21:38
+
Harsh J 2012-09-18, 02:43
Copy link to this message
-
Re: Hadoop recovery test
Robert Molina 2012-09-17, 21:55
Hi Artem,
At what point do you do the copy, was namenode still running? Does the copy
of the edits file and fsimage file match up with the original (i.e
filesize)?

-Robert

On Mon, Sep 17, 2012 at 2:38 PM, Artem Ervits <[EMAIL PROTECTED]> wrote:

>  Hello all,****
>
> ** **
>
> I am testing the Hadoop recovery as per
> http://wiki.apache.org/hadoop/NameNode document. But instead of using an
> NFS share, I am copying to another directory. Then when I shut down the
> cluster, I scp that directory to another server and start Hadoop cluster
> using that machine as the namenode. I see in the log that some blocks are
> corrupt and/or missing. Do I have to wait for replication to recover all
> blocks or am I doing something else altogether? I am using Hadoop 1.0.3.
> Can someone point me to a more detailed document than the wiki in case I’m
> doing something wrong.****
>
> ** **
>
> p.s. if I restart the cluster using the original namenode, filesystem
> reports as healthy.****
>
> ** **
>
> Thank you.****
>
> ** **
>
> .****
>
> /hdfs/hadoop/tmp/mapred/system/jobtracker.info: CORRUPT block
> blk_9043419219670949307****
>
> ** **
>
> /hdfs/hadoop/tmp/mapred/system/jobtracker.info: MISSING 1 blocks of total
> size 4 B...****
>
> /user/hduser/teragen/_logs/history/job_201209120941_0002_1347458152167_hduser_TeraGen:
> Under replicated blk_-976282286234272458_1079. Target Replicas is 3 but
> found 1 replica(s).****
>
> .****
>
> /user/hduser/teragen/_logs/history/job_201209120941_0002_conf.xml:  Under
> replicated blk_137658109390447967_1075. Target Replicas is 3 but found 1
> replica(s).****
>
> .****
>
> /user/hduser/teragen/_partition.lst:  Under replicated
> blk_-3005280481530403302_1080. Target Replicas is 3 but found 1 replica(s).
> ****
>
> .****
>
> /user/hduser/teragen/part-00000:  Under replicated
> blk_-7008813028808832816_1077. Target Replicas is 3 but found 1 replica(s).
> ****
>
> .****
>
> /user/hduser/teragen/part-00001:  Under replicated
> blk_-5256967771026054061_1078. Target Replicas is 3 but found 1 replica(s).
> ****
>
> ..****
>
> /user/hduser/teragen-out/_logs/history/job_201209120941_0003_1347458249920_hduser_TeraSort:
> Under replicated blk_1137779303840586677_1089. Target Replicas is 3 but
> found 1 replica(s).****
>
> .****
>
> /user/hduser/teragen-out/_logs/history/job_201209120941_0003_conf.xml:
> Under replicated blk_7701720691642589882_1086. Target Replicas is 3 but
> found 1 replica(s).****
>
> .****
>
> /user/hduser/teragen-out/part-00000: CORRUPT block blk_8059469267617478950
> ****
>
> ** **
>
> /user/hduser/teragen-out/part-00000: MISSING 1 blocks of total size
> 1000000 B...****
>
> /user/hduser/teragen-validate/_logs/history/job_201209120941_0004_1347458495941_hduser_TeraValidate:
> Under replicated blk_5680565744062298575_1098. Target Replicas is 3 but
> found 1 replica(s).****
>
> .****
>
> /user/hduser/teragen-validate/_logs/history/job_201209120941_0004_conf.xml:
> Under replicated blk_1566253937037013126_1095. Target Replicas is 3 but
> found 1 replica(s).****
>
> .Status: CORRUPT****
>
> Total size:    1050720258 B****
>
> Total dirs:    39****
>
> Total files:   32****
>
> Total blocks (validated):      42 (avg. block size 25017149 B)****
>
>   ************************************
>
>   CORRUPT FILES:        2****
>
>   MISSING BLOCKS:       2****
>
>   MISSING SIZE:         1000004 B****
>
>   CORRUPT BLOCKS:       2****
>
>   ************************************
>
> Minimally replicated blocks:   40 (95.2381 %)****
>
> Over-replicated blocks:        0 (0.0 %)****
>
> Under-replicated blocks:       40 (95.2381 %)****
>
> Mis-replicated blocks:         0 (0.0 %)****
>
> Default replication factor:    3****
>
> Average block replication:     0.95238096****
>
> Corrupt blocks:                2****
>
> Missing replicas:              80 (200.0 %)****
>
> Number of data-nodes:          1****
>
> Number of racks:               1****
>
> FSCK ended at Mon Sep 17 17:29:08 EDT 2012 in 21 milliseconds****
+
James Brown 2012-09-17, 21:48