Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS, mail # dev - mapred replication


+
Sirianni, Eric 2013-08-15, 13:21
+
Chris Nauroth 2013-08-16, 21:21
+
Jay Vyas 2013-08-16, 21:40
Copy link to this message
-
Re: mapred replication
Robert Evans 2013-08-19, 14:10
Without the stack trace of the exceptions it is hard to tell.  The pruning
is asynchronous, but so is a node crashing with a replica on it.  The
client is supposed to detect this situation and find a new replica that
works.  I am not that familiar with the code, but I believe in some if not
all of these cases it will log the exception to indicate that something
bad happened, but it recovered.

--Bobby

On 8/16/13 4:40 PM, "Jay Vyas" <[EMAIL PROTECTED]> wrote:

>Why should this lead to an IOException?  Is it because the pruning of
>replicas is asynchronous and the datanodes try to access nonexistent
>files?  If so that seems like a pretty major bug
>
>
>On Fri, Aug 16, 2013 at 5:21 PM, Chris Nauroth
><[EMAIL PROTECTED]>wrote:
>
>> Hi Eric,
>>
>> Yes, this is intentional.  The job.xml file and the job jar file get
>>read
>> from every node running a map or reduce task.  Because of this, using a
>> higher than normal replication factor on these files improves locality.
>>  More than 3 task slots will have access to local replicas.  These files
>> tend to be much smaller than the actual data read by a job, so there
>>tends
>> to be little harm done in terms of disk space consumption.
>>
>> Why not create the file initially with 10 replicas instead of creating
>>it
>> with 3 and then dialing up?  I imagine this is so that job submission
>> doesn't block on a synchronous write to a long pipeline.  The extra
>> replicas aren't necessary for correctness, and a long-running job will
>>get
>> the locality benefits in the long term once more replicas are created in
>> the background.
>>
>> I recommend submitting a new jira describing the problem that you saw.
>>We
>> probably can handle this better, and a jira would be a good place to
>> discuss the trade-offs.  A few possibilities:
>>
>> Log a warning if mapred.submit.replication < dfs.replication.
>> Skip resetting replication if mapred.submit.replication <>>dfs.replication.
>> Fail with error if mapred.submit.replication < dfs.replication.
>>
>> Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>
>> On Thu, Aug 15, 2013 at 6:21 AM, Sirianni, Eric
>><[EMAIL PROTECTED]
>> >wrote:
>>
>> > In debugging some replication issues in our HDFS environment, I
>>noticed
>> > that the MapReduce framework uses the following algorithm for setting
>>the
>> > replication on submitted job files:
>> >
>> > 1.     Create the file with *default* DFS replication factor (i.e.
>> > 'dfs.replication')
>> >
>> > 2.     Subsequently alter the replication of the file based on the
>> > 'mapred.submit.replication' config value
>> >
>> >   private static FSDataOutputStream createFile(FileSystem fs, Path
>> > splitFile,
>> >       Configuration job)  throws IOException {
>> >     FSDataOutputStream out = FileSystem.create(fs, splitFile,
>> >         new FsPermission(JobSubmissionFiles.JOB_FILE_PERMISSION));
>> >     int replication = job.getInt("mapred.submit.replication", 10);
>> >     fs.setReplication(splitFile, (short)replication);
>> >     writeSplitHeader(out);
>> >     return out;
>> >   }
>> >
>> > If I understand currectly, the net functional effect of this approach
>>is
>> > that
>> >
>> > -       The initial write pipeline is setup with 'dfs.replication'
>>nodes
>> > (i.e. 3)
>> >
>> > -       The namenode triggers additional inter-datanode replications
>>in
>> > the background (as it detects the blocks as "under-replicated").
>> >
>> > I'm assuming this is intentional?  Alternatively, if the
>> > mapred.submit.replication was specified on initial create, the write
>> > pipeline would be significantly larger.
>> >
>> > The reason I noticed is that we had inadvertently specified
>> > mapred.submit.replication as *less than* dfs.replication in our
>> > configuration, which caused a bunch of excess replica pruning (and
>> > ultimately IOExceptions in our datanode logs).
>> >
>> > Thanks,
>> > Eric
>> >
>> >
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or
+
Sirianni, Eric 2013-08-19, 14:36
+
Sirianni, Eric 2013-08-19, 18:01