Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Is there a way to keep all intermediate files there after the MapReduce Job run?


Copy link to this message
-
Re: Is there a way to keep all intermediate files there after the MapReduce Job run?
Jean-Marc Spaggiari 2013-03-01, 13:49
Ling, do you have Hadoop: The Definitive Guide close-by?

I think I remember somewhere they said about keeping the intermediate files.

Take a look at keep.task.files.pattern... It might help you to keep
some of the files you are looking for? Maybe not all... Or even maybe
not any.

JM

2013/3/1 Michael Segel <[EMAIL PROTECTED]>:
> Your job.xml file is kept for a set period of time.
> I believe the others are automatically removed.
>
> You can easily access the job.xml file from the JT webpage.
>
> On Mar 1, 2013, at 4:14 AM, Ling Kun <[EMAIL PROTECTED]> wrote:
>
> Dear all,
>     In order to know more about the files creation and size when the job is
> running, I want to keep all the intermediate files there (job.xml,
> spillN.out, file.out, file.index, map.out-N, etc).
>
> My question is :
> 1. Is there any configurations that can make this happen? Or could I modify
> some Hadoop MapReduce code for this ?
>
> 2. Since each job, each task, and each attempt of the task using different
> directories to store all the intermediate files, keeping the files there
> without deleting will not hurt the whole MapReduce cluster except taking up
> some storage. Am I write?
>
> Thanks
>
> yours,
> Ling Kun
>
> --
> http://www.lingcc.com
>
>