-Re: Is there a way to keep all intermediate files there after the MapReduce Job run?
Jean-Marc Spaggiari 2013-03-01, 13:49
Ling, do you have Hadoop: The Definitive Guide close-by?
I think I remember somewhere they said about keeping the intermediate files.
Take a look at keep.task.files.pattern... It might help you to keep
some of the files you are looking for? Maybe not all... Or even maybe
2013/3/1 Michael Segel <[EMAIL PROTECTED]>:
> Your job.xml file is kept for a set period of time.
> I believe the others are automatically removed.
> You can easily access the job.xml file from the JT webpage.
> On Mar 1, 2013, at 4:14 AM, Ling Kun <[EMAIL PROTECTED]> wrote:
> Dear all,
> In order to know more about the files creation and size when the job is
> running, I want to keep all the intermediate files there (job.xml,
> spillN.out, file.out, file.index, map.out-N, etc).
> My question is :
> 1. Is there any configurations that can make this happen? Or could I modify
> some Hadoop MapReduce code for this ?
> 2. Since each job, each task, and each attempt of the task using different
> directories to store all the intermediate files, keeping the files there
> without deleting will not hurt the whole MapReduce cluster except taking up
> some storage. Am I write?
> Ling Kun