|
|
+
Ling Kun 2013-03-01, 10:14
+
Michael Segel 2013-03-01, 13:23
-
Re: Is there a way to keep all intermediate files there after the MapReduce Job run?Jean-Marc Spaggiari 2013-03-01, 13:49
Ling, do you have Hadoop: The Definitive Guide close-by?
I think I remember somewhere they said about keeping the intermediate files. Take a look at keep.task.files.pattern... It might help you to keep some of the files you are looking for? Maybe not all... Or even maybe not any. JM 2013/3/1 Michael Segel <[EMAIL PROTECTED]>: > Your job.xml file is kept for a set period of time. > I believe the others are automatically removed. > > You can easily access the job.xml file from the JT webpage. > > On Mar 1, 2013, at 4:14 AM, Ling Kun <[EMAIL PROTECTED]> wrote: > > Dear all, > In order to know more about the files creation and size when the job is > running, I want to keep all the intermediate files there (job.xml, > spillN.out, file.out, file.index, map.out-N, etc). > > My question is : > 1. Is there any configurations that can make this happen? Or could I modify > some Hadoop MapReduce code for this ? > > 2. Since each job, each task, and each attempt of the task using different > directories to store all the intermediate files, keeping the files there > without deleting will not hurt the whole MapReduce cluster except taking up > some storage. Am I write? > > Thanks > > yours, > Ling Kun > > -- > http://www.lingcc.com > > |