Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Re: final the dfs.replication and fsck


Copy link to this message
-
Re: final the dfs.replication and fsck
Patai Sangbutsarakum 2012-10-16, 00:02
Just want to share & check if this is make sense.

Job was failed to run after i restarted the namenode and the cluster
stopped complain about under-replication.

this is what i found in log file

Requested replication 10 exceeds maximum 2
java.io.IOException: file
/tmp/hadoop-apps/mapred/staging/apps/.staging/job_201210151601_0494/job.jar.
Requested replication 10 exceeds maximum 2
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyReplication(FSNamesystem.java:1126)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplicationInternal(FSNamesystem.java:1074)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:1059)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.setReplication(NameNode.java:629)
        at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:143
So, i scanned though those xml config files, and guess to change
<name>mapred.submit.replication</name> from 10 to 2, and restarted again.

That's when jobs can start running again.
Hopefully that change is make sense.
Thanks
Patai

On Mon, Oct 15, 2012 at 1:57 PM, Patai Sangbutsarakum
<[EMAIL PROTECTED]> wrote:
> Thanks Harsh, dfs.replication.max does do the magic!!
>
> On Mon, Oct 15, 2012 at 1:19 PM, Chris Nauroth <[EMAIL PROTECTED]> wrote:
>> Thank you, Harsh.  I did not know about dfs.replication.max.
>>
>>
>> On Mon, Oct 15, 2012 at 12:23 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>>>
>>> Hey Chris,
>>>
>>> The dfs.replication param is an exception to the <final> config
>>> feature. If one uses the FileSystem API, one can pass in any short
>>> value they want the replication to be. This bypasses the
>>> configuration, and the configuration (being per-file) is also client
>>> sided.
>>>
>>> The right way for an administrator to enforce a "max" replication
>>> value at a create/setRep level, would be to set
>>> the dfs.replication.max to a desired value at the NameNode and restart
>>> it.
>>>
>>> On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
>>> <[EMAIL PROTECTED]> wrote:
>>> > Hello Patai,
>>> >
>>> > Has your configuration file change been copied to all nodes in the
>>> > cluster?
>>> >
>>> > Are there applications connecting from outside of the cluster?  If so,
>>> > then
>>> > those clients could have separate configuration files or code setting
>>> > dfs.replication (and other configuration properties).  These would not
>>> > be
>>> > limited by final declarations in the cluster's configuration files.
>>> > <final>true</final> controls configuration file resource loading, but it
>>> > does not necessarily block different nodes or different applications
>>> > from
>>> > running with completely different configurations.
>>> >
>>> > Hope this helps,
>>> > --Chris
>>> >
>>> >
>>> > On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
>>> > <[EMAIL PROTECTED]> wrote:
>>> >>
>>> >> Hi Hadoopers,
>>> >>
>>> >> I have
>>> >> <property>
>>> >>     <name>dfs.replication</name>
>>> >>     <value>2</value>
>>> >>     <final>true</final>
>>> >>   </property>
>>> >>
>>> >> set in hdfs-site.xml in staging environment cluster. while the staging
>>> >> cluster is running the code that will later be deployed in production,
>>> >> those code is trying to have dfs.replication of 3, 10, 50, other than
>>> >> 2; the number that developer thought that will fit in production
>>> >> environment.
>>> >>
>>> >> Even though I final the property dfs.replication in staging cluster
>>> >> already. every time i run fsck on the staging cluster i still see it
>>> >> said under replication.
>>> >> I thought final keyword will not honor value in job config, but it