|
|
-
HDFS is corrupt, need to salvage the data.
Mayuran Yogarajah 2009-03-10, 00:20
Hello, it seems the HDFS in my cluster is corrupt. This is the output from hadoop fsck: Total size: 9196815693 B Total dirs: 17 Total files: 157 Total blocks: 157 (avg. block size 58578443 B) ******************************** CORRUPT FILES: 157 MISSING BLOCKS: 157 MISSING SIZE: 9196815693 B ******************************** Minimally replicated blocks: 0 (0.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 1 Average block replication: 0.0 Missing replicas: 0 Number of data-nodes: 1 Number of racks: 1
It seems to say that there is 1 block missing from every file that was in the cluster..
I'm not sure how to proceed so any guidance would be much appreciated. My primary concern is recovering the data.
thanks
-
Re: HDFS is corrupt, need to salvage the data.
lohit 2009-03-10, 00:34
How many Datanodes do you have. >From the output it looks like at the point when you ran fsck, you had only one datanode connected to your NameNode. Did you have others? Also, I see that your default replication is set to 1. Can you check if your datanodes are up and running. Lohit
----- Original Message ---- From: Mayuran Yogarajah <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Monday, March 9, 2009 5:20:37 PM Subject: HDFS is corrupt, need to salvage the data.
Hello, it seems the HDFS in my cluster is corrupt. This is the output from hadoop fsck: Total size: 9196815693 B Total dirs: 17 Total files: 157 Total blocks: 157 (avg. block size 58578443 B) ******************************** CORRUPT FILES: 157 MISSING BLOCKS: 157 MISSING SIZE: 9196815693 B ******************************** Minimally replicated blocks: 0 (0.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 1 Average block replication: 0.0 Missing replicas: 0 Number of data-nodes: 1 Number of racks: 1
It seems to say that there is 1 block missing from every file that was in the cluster..
I'm not sure how to proceed so any guidance would be much appreciated. My primary concern is recovering the data.
thanks
-
Re: HDFS is corrupt, need to salvage the data.
Mayuran Yogarajah 2009-03-10, 17:14
lohit wrote: > How many Datanodes do you have. > From the output it looks like at the point when you ran fsck, you had only one datanode connected to your NameNode. Did you have others? > Also, I see that your default replication is set to 1. Can you check if your datanodes are up and running. > Lohit > > > There is only one data node at the moment. Does this mean the data is not recoverable? The HD on the machine seems fine so I'm a little confused as to what caused the HDFS to become corrupted.
M
-
Re: HDFS is corrupt, need to salvage the data.
Raghu Angadi 2009-03-10, 18:44
Mayuran Yogarajah wrote: > lohit wrote: >> How many Datanodes do you have. >> From the output it looks like at the point when you ran fsck, you had >> only one datanode connected to your NameNode. Did you have others? >> Also, I see that your default replication is set to 1. Can you check >> if your datanodes are up and running. >> Lohit >> >> >> > There is only one data node at the moment. Does this mean the data is > not recoverable? > The HD on the machine seems fine so I'm a little confused as to what > caused the HDFS to > become corrupted.
The block files usually don't disappear easily. Check on the datanode if you find any files starting with "blk". Also check datanode log to see what happened there... may be use started on a different directory or something like that.
Raghu.
-
Re: HDFS is corrupt, need to salvage the data.
Mayuran Yogarajah 2009-03-10, 19:19
Raghu Angadi wrote: > The block files usually don't disappear easily. Check on the datanode if > you find any files starting with "blk". Also check datanode log to see > what happened there... may be use started on a different directory or > something like that. > > Raghu. >
There are indeed blk files: find -name 'blk*' | wc -l 158
I didn't see anything out of the ordinary in the datanode log.
At this point is there anything I can do to recover the files? Or do I need to reformat the data node and load the data in again ?
thanks
-
Re: HDFS is corrupt, need to salvage the data.
Mayuran Yogarajah 2009-03-11, 18:52
Mayuran Yogarajah wrote: > Raghu Angadi wrote: > >> The block files usually don't disappear easily. Check on the datanode if >> you find any files starting with "blk". Also check datanode log to see >> what happened there... may be use started on a different directory or >> something like that. >> >> Raghu. >> >> > > There are indeed blk files: > find -name 'blk*' | wc -l > 158 > > I didn't see anything out of the ordinary in the datanode log. > > At this point is there anything I can do to recover the files? Or do I > need to reformat > the data node and load the data in again ? > > thanks > Sorry to resend this but I didn't receive a response and wanted to know how to proceed. Is it possible to recover the data at this stage? Or is it gone ?
thanks
-
Re: HDFS is corrupt, need to salvage the data.
Raghu Angadi 2009-03-11, 19:08
Mayuran,
It takes very long for a lot of iterations if we have to go through each debugging step, one at a time. May be a jira is a good place.
- Run fsck with blocks option.
- Check if those ids match with ids in file names found by 'find'.
- Check which directory are these files in.. and verify if that matches with datanode configured directory
You are saying there is nothing wrong in the log files, but does it imply that datanode sees those 157 missing blocks? May be you should post the log or verify that yourself. If DN is working correctly according to you, then you should not have 100% of blocks missing.
There are many possibilities, it not easy for me list the the right one in your case without much info or list all possible conditions.
Raghu.
Mayuran Yogarajah wrote: > Mayuran Yogarajah wrote: >> Raghu Angadi wrote: >> >>> The block files usually don't disappear easily. Check on the datanode if >>> you find any files starting with "blk". Also check datanode log to see >>> what happened there... may be use started on a different directory or >>> something like that. >>> >>> Raghu. >>> >>> >> >> There are indeed blk files: >> find -name 'blk*' | wc -l >> 158 >> >> I didn't see anything out of the ordinary in the datanode log. >> >> At this point is there anything I can do to recover the files? Or do I >> need to reformat >> the data node and load the data in again ? >> >> thanks >> > Sorry to resend this but I didn't receive a response and wanted to know > how to proceed. > Is it possible to recover the data at this stage? Or is it gone ? > > thanks
|
|