Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> how to handle the corrupt block in HDFS?


+
ch huang 2013-12-10, 00:32
+
ch huang 2013-12-10, 01:15
+
ch huang 2013-12-10, 01:20
+
ch huang 2013-12-11, 01:18
+
Vinayakumar B 2013-12-11, 01:21
+
Adam Kawa 2013-12-11, 18:33
+
ch huang 2013-12-12, 00:44
Copy link to this message
-
Re: how to handle the corrupt block in HDFS?
and is fsck report data from BlockPoolSliceScanner? it seems run once each
3 weeks
can i restart DN one by one without interrupt the job which is running?

On Thu, Dec 12, 2013 at 2:33 AM, Adam Kawa <[EMAIL PROTECTED]> wrote:

>  I have only 1-node cluster, so I am not able to verify it when
> replication factor is bigger than 1.
>
>  I run the fsck on a file that consists of 3 blocks, and 1 block has a
> corrupt replica. fsck told that the system is HEALTHY.
>
> When I restarted the DN, then the block scanner (BlockPoolSliceScanner)
> started and it detected a corrupted replica. Then I run fsck again on that
> file, and it told me that the system is CORRUPT.
>
> If you have a small (and non-production) cluster, can you restart your
> datandoes and run fsck again?
>
>
>
> 2013/12/11 ch huang <[EMAIL PROTECTED]>
>
>> thanks for reply,but if the block just has  1 corrupt replica,hdfs fsck
>> can not tell you which block of which file has a replica been
>> corrupted,fsck just useful on all of one block's replica bad
>>
>> On Wed, Dec 11, 2013 at 10:01 AM, Adam Kawa <[EMAIL PROTECTED]> wrote:
>>
>>> When you identify a file with corrupt block(s), then you can locate the
>>> machines that stores its block by typing
>>> $ sudo -u hdfs hdfs fsck <path-to-file> -files -blocks -locations
>>>
>>>
>>> 2013/12/11 Adam Kawa <[EMAIL PROTECTED]>
>>>
>>>> Maybe this can work for you
>>>> $ sudo -u hdfs hdfs fsck / -list-corruptfileblocks
>>>> ?
>>>>
>>>>
>>>> 2013/12/11 ch huang <[EMAIL PROTECTED]>
>>>>
>>>>> thanks for reply, what i do not know is how can i locate the block
>>>>> which has the corrupt replica,(so i can observe how long the corrupt
>>>>> replica will be removed and a new health replica replace it,because i get
>>>>> nagios alert for three days,i do not sure if it is the same corrupt replica
>>>>> cause the alert ,and i do not know the interval of hdfs check corrupt
>>>>> replica and clean it)
>>>>>
>>>>>
>>>>> On Tue, Dec 10, 2013 at 6:20 PM, Vinayakumar B <
>>>>> [EMAIL PROTECTED]> wrote:
>>>>>
>>>>>>  Hi ch huang,
>>>>>>
>>>>>>
>>>>>>
>>>>>> It may seem strange, but the fact is,
>>>>>>
>>>>>> *CorruptBlocks* through JMX means *“Number of blocks with corrupt
>>>>>> replicas”. May not be all replicas are corrupt.  *This you can check
>>>>>> though jconsole for description.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Where as *Corrupt blocks* through fsck means, *blocks with all
>>>>>> replicas corrupt(non-recoverable)/ missing.*
>>>>>>
>>>>>>
>>>>>>
>>>>>> In your case, may be one of the replica is corrupt, not all replicas
>>>>>> of same block. This corrupt replica will be deleted automatically if one
>>>>>> more datanode available in your cluster and block replicated to that.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Related to replication 10, As Peter Marron said, *some of the
>>>>>> important files of the mapreduce job will set the replication of 10, to
>>>>>> make it accessible faster and launch map tasks faster. *
>>>>>>
>>>>>> Anyway, if the job is success these files will be deleted
>>>>>> auomatically. I think only in some cases if the jobs are killed in between
>>>>>> these files will remain in hdfs showing underreplicated blocks.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks and Regards,
>>>>>>
>>>>>> Vinayakumar B
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* Peter Marron [mailto:[EMAIL PROTECTED]]
>>>>>> *Sent:* 10 December 2013 14:19
>>>>>> *To:* [EMAIL PROTECTED]
>>>>>> *Subject:* RE: how to handle the corrupt block in HDFS?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>>
>>>>>> I am sure that there are others who will answer this better, but
>>>>>> anyway.
>>>>>>
>>>>>> The default replication level for files in HDFS is 3 and so most
>>>>>> files that you
>>>>>>
>>>>>> see will have a replication level of 3. However when you run a
>>>>>> Map/Reduce
>>>>>>
>>>>>> job the system knows in advance that every node will need a copy of
>>>>>>
>>>>>> certain files. Specifically the job.xml and the various jars
>