Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> RE: how to handle the corrupt block in HDFS?


Copy link to this message
-
Re: how to handle the corrupt block in HDFS?
When you identify a file with corrupt block(s), then you can locate the
machines that stores its block by typing
$ sudo -u hdfs hdfs fsck <path-to-file> -files -blocks -locations
2013/12/11 Adam Kawa <[EMAIL PROTECTED]>

> Maybe this can work for you
> $ sudo -u hdfs hdfs fsck / -list-corruptfileblocks
> ?
>
>
> 2013/12/11 ch huang <[EMAIL PROTECTED]>
>
>> thanks for reply, what i do not know is how can i locate the block which
>> has the corrupt replica,(so i can observe how long the corrupt replica will
>> be removed and a new health replica replace it,because i get nagios alert
>> for three days,i do not sure if it is the same corrupt replica cause the
>> alert ,and i do not know the interval of hdfs check corrupt replica and
>> clean it)
>>
>>
>> On Tue, Dec 10, 2013 at 6:20 PM, Vinayakumar B <[EMAIL PROTECTED]>wrote:
>>
>>>  Hi ch huang,
>>>
>>>
>>>
>>> It may seem strange, but the fact is,
>>>
>>> *CorruptBlocks* through JMX means *“Number of blocks with corrupt
>>> replicas”. May not be all replicas are corrupt.  *This you can check
>>> though jconsole for description.
>>>
>>>
>>>
>>> Where as *Corrupt blocks* through fsck means, *blocks with all replicas
>>> corrupt(non-recoverable)/ missing.*
>>>
>>>
>>>
>>> In your case, may be one of the replica is corrupt, not all replicas of
>>> same block. This corrupt replica will be deleted automatically if one more
>>> datanode available in your cluster and block replicated to that.
>>>
>>>
>>>
>>>
>>>
>>> Related to replication 10, As Peter Marron said, *some of the important
>>> files of the mapreduce job will set the replication of 10, to make it
>>> accessible faster and launch map tasks faster. *
>>>
>>> Anyway, if the job is success these files will be deleted auomatically.
>>> I think only in some cases if the jobs are killed in between these files
>>> will remain in hdfs showing underreplicated blocks.
>>>
>>>
>>>
>>> Thanks and Regards,
>>>
>>> Vinayakumar B
>>>
>>>
>>>
>>> *From:* Peter Marron [mailto:[EMAIL PROTECTED]]
>>> *Sent:* 10 December 2013 14:19
>>> *To:* [EMAIL PROTECTED]
>>> *Subject:* RE: how to handle the corrupt block in HDFS?
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> I am sure that there are others who will answer this better, but anyway.
>>>
>>> The default replication level for files in HDFS is 3 and so most files
>>> that you
>>>
>>> see will have a replication level of 3. However when you run a Map/Reduce
>>>
>>> job the system knows in advance that every node will need a copy of
>>>
>>> certain files. Specifically the job.xml and the various jars containing
>>>
>>> classes that will be needed to run the mappers and reducers. So the
>>>
>>> system arranges that some of these files have a higher replication
>>> level. This increases
>>>
>>> the chances that a copy will be found locally.
>>>
>>> By default this higher replication level is 10.
>>>
>>>
>>>
>>> This can seem a little odd on a cluster where you only have, say, 3
>>> nodes.
>>>
>>> Because it means that you will almost always have some blocks that are
>>> marked
>>>
>>> under-replicated. I think that there was some discussion a while back to
>>> change
>>>
>>> this to make the replication level something like min(10, #number of
>>> nodes)
>>>
>>> However, as I recall, the general consensus was that this was extra
>>>
>>> complexity that wasn’t really worth it. If it ain’t broke…
>>>
>>>
>>>
>>> Hope that this helps.
>>>
>>>
>>>
>>> *Peter Marron*
>>>
>>> Senior Developer, Research & Development
>>>
>>>
>>>
>>> Office: +44 *(0) 118-940-7609*  [EMAIL PROTECTED]
>>>
>>> Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK
>>>
>>>    <https://www.facebook.com/pages/Trillium-Software/109184815778307>
>>>
>>>  <https://twitter.com/TrilliumSW>
>>>
>>>  <http://www.linkedin.com/company/17710>
>>>
>>>
>>>
>>> *www.trilliumsoftware.com <http://www.trilliumsoftware.com/>*
>>>
>>> Be Certain About Your Data. Be Trillium Certain.
>>>
>>>
>>>
>>> *From:* ch huang [mailto:[EMAIL PROTECTED] <[EMAIL PROTECTED]>]
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB