Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> how to handle the corrupt block in HDFS?


Copy link to this message
-
Re: how to handle the corrupt block in HDFS?
"By default this higher replication level is 10. "
is this value can be control via some option or variable? i only hive a
5-worknode cluster,and i think 5 replicas should be better,because every
node can get a local replica.

another question is ,why hdfs fsck check the cluster is healthy and no
corrupt block,but i see one corrupt block though checking NN metrics?
curl http://NNIP:50070/jmx <http://nnip:50070/jmx> ,thanks
On Tue, Dec 10, 2013 at 4:48 PM, Peter Marron <
[EMAIL PROTECTED]> wrote:

>  Hi,
>
>
>
> I am sure that there are others who will answer this better, but anyway.
>
> The default replication level for files in HDFS is 3 and so most files
> that you
>
> see will have a replication level of 3. However when you run a Map/Reduce
>
> job the system knows in advance that every node will need a copy of
>
> certain files. Specifically the job.xml and the various jars containing
>
> classes that will be needed to run the mappers and reducers. So the
>
> system arranges that some of these files have a higher replication level.
> This increases
>
> the chances that a copy will be found locally.
>
> By default this higher replication level is 10.
>
>
>
> This can seem a little odd on a cluster where you only have, say, 3 nodes.
>
> Because it means that you will almost always have some blocks that are
> marked
>
> under-replicated. I think that there was some discussion a while back to
> change
>
> this to make the replication level something like min(10, #number of nodes)
>
> However, as I recall, the general consensus was that this was extra
>
> complexity that wasn’t really worth it. If it ain’t broke…
>
>
>
> Hope that this helps.
>
>
>
> *Peter Marron*
>
> Senior Developer, Research & Development
>
>
>
> Office: +44 *(0) 118-940-7609*  [EMAIL PROTECTED]
>
> Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK
>
>   <https://www.facebook.com/pages/Trillium-Software/109184815778307>
>
>  <https://twitter.com/TrilliumSW>
>
>  <http://www.linkedin.com/company/17710>
>
>
>
> *www.trilliumsoftware.com <http://www.trilliumsoftware.com/>*
>
> Be Certain About Your Data. Be Trillium Certain.
>
>
>
> *From:* ch huang [mailto:[EMAIL PROTECTED]]
> *Sent:* 10 December 2013 01:21
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: how to handle the corrupt block in HDFS?
>
>
>
> more strange , in my HDFS cluster ,every block has three replicas,but i
> find some one has ten replicas ,why?
>
>
>
> # sudo -u hdfs hadoop fs -ls
> /data/hisstage/helen/.staging/job_1385542328307_0915
> Found 5 items
> -rw-r--r--   3 helen hadoop          7 2013-11-29 14:01
> /data/hisstage/helen/.staging/job_1385542328307_0915/appTokens
> -rw-r--r--  10 helen hadoop    2977839 2013-11-29 14:01
> /data/hisstage/helen/.staging/job_1385542328307_0915/job.jar
> -rw-r--r--  10 helen hadoop       3696 2013-11-29 14:01
> /data/hisstage/helen/.staging/job_1385542328307_0915/job.split
>
>  On Tue, Dec 10, 2013 at 9:15 AM, ch huang <[EMAIL PROTECTED]> wrote:
>
> the strange thing is when i use the following command i find 1 corrupt
> block
>
>
>
> #  curl -s http://ch11:50070/jmx |grep orrupt
>     "CorruptBlocks" : 1,
>
> but when i run hdfs fsck / , i get none ,everything seems fine
>
>
>
> # sudo -u hdfs hdfs fsck /
>
> ........
>
>
>
> ....................................Status: HEALTHY
>  Total size:    1479728140875 B (Total open files size: 1677721600 B)
>  Total dirs:    21298
>  Total files:   100636 (Files currently being written: 25)
>  Total blocks (validated):      119788 (avg. block size 12352891 B) (Total
> open file blocks (not validated): 37)
>  Minimally replicated blocks:   119788 (100.0 %)
>  Over-replicated blocks:        0 (0.0 %)
>  Under-replicated blocks:       166 (0.13857816 %)
>  Mis-replicated blocks:         0 (0.0 %)
>  Default replication factor:    3
>  Average block replication:     3.0027633
>  Corrupt blocks:                0
>  Missing replicas:              831 (0.23049656 %)
>  Number of data-nodes:          5
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB