Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # dev >> mapred replication


Copy link to this message
-
Re: mapred replication
Without the stack trace of the exceptions it is hard to tell.  The pruning
is asynchronous, but so is a node crashing with a replica on it.  The
client is supposed to detect this situation and find a new replica that
works.  I am not that familiar with the code, but I believe in some if not
all of these cases it will log the exception to indicate that something
bad happened, but it recovered.

--Bobby

On 8/16/13 4:40 PM, "Jay Vyas" <[EMAIL PROTECTED]> wrote:

>Why should this lead to an IOException?  Is it because the pruning of
>replicas is asynchronous and the datanodes try to access nonexistent
>files?  If so that seems like a pretty major bug
>
>
>On Fri, Aug 16, 2013 at 5:21 PM, Chris Nauroth
><[EMAIL PROTECTED]>wrote:
>
>> Hi Eric,
>>
>> Yes, this is intentional.  The job.xml file and the job jar file get
>>read
>> from every node running a map or reduce task.  Because of this, using a
>> higher than normal replication factor on these files improves locality.
>>  More than 3 task slots will have access to local replicas.  These files
>> tend to be much smaller than the actual data read by a job, so there
>>tends
>> to be little harm done in terms of disk space consumption.
>>
>> Why not create the file initially with 10 replicas instead of creating
>>it
>> with 3 and then dialing up?  I imagine this is so that job submission
>> doesn't block on a synchronous write to a long pipeline.  The extra
>> replicas aren't necessary for correctness, and a long-running job will
>>get
>> the locality benefits in the long term once more replicas are created in
>> the background.
>>
>> I recommend submitting a new jira describing the problem that you saw.
>>We
>> probably can handle this better, and a jira would be a good place to
>> discuss the trade-offs.  A few possibilities:
>>
>> Log a warning if mapred.submit.replication < dfs.replication.
>> Skip resetting replication if mapred.submit.replication <>>dfs.replication.
>> Fail with error if mapred.submit.replication < dfs.replication.
>>
>> Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>
>> On Thu, Aug 15, 2013 at 6:21 AM, Sirianni, Eric
>><[EMAIL PROTECTED]
>> >wrote:
>>
>> > In debugging some replication issues in our HDFS environment, I
>>noticed
>> > that the MapReduce framework uses the following algorithm for setting
>>the
>> > replication on submitted job files:
>> >
>> > 1.     Create the file with *default* DFS replication factor (i.e.
>> > 'dfs.replication')
>> >
>> > 2.     Subsequently alter the replication of the file based on the
>> > 'mapred.submit.replication' config value
>> >
>> >   private static FSDataOutputStream createFile(FileSystem fs, Path
>> > splitFile,
>> >       Configuration job)  throws IOException {
>> >     FSDataOutputStream out = FileSystem.create(fs, splitFile,
>> >         new FsPermission(JobSubmissionFiles.JOB_FILE_PERMISSION));
>> >     int replication = job.getInt("mapred.submit.replication", 10);
>> >     fs.setReplication(splitFile, (short)replication);
>> >     writeSplitHeader(out);
>> >     return out;
>> >   }
>> >
>> > If I understand currectly, the net functional effect of this approach
>>is
>> > that
>> >
>> > -       The initial write pipeline is setup with 'dfs.replication'
>>nodes
>> > (i.e. 3)
>> >
>> > -       The namenode triggers additional inter-datanode replications
>>in
>> > the background (as it detects the blocks as "under-replicated").
>> >
>> > I'm assuming this is intentional?  Alternatively, if the
>> > mapred.submit.replication was specified on initial create, the write
>> > pipeline would be significantly larger.
>> >
>> > The reason I noticed is that we had inadvertently specified
>> > mapred.submit.replication as *less than* dfs.replication in our
>> > configuration, which caused a bunch of excess replica pruning (and
>> > ultimately IOExceptions in our datanode logs).
>> >
>> > Thanks,
>> > Eric
>> >
>> >
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB