The temp files between MR jobs are stored on dfs. This has to be on dfs as these are inputs to the next MR job.
On 9/13/10 3:15 PM, "jiang licht" <[EMAIL PROTECTED]> wrote:
All these settings should point to non-dfs folders. But I saw some pig jobs save intermediate outputs to "/tmp" in HDFS (maybe just in "interactive mode", not sure if I remember correctly, will check this), which means they get replicated and use much more space.
--- On Mon, 9/13/10, Mr. Jan Walter <[EMAIL PROTECTED]> wrote:
From: Mr. Jan Walter <[EMAIL PROTECTED]>
Subject: Re: specify temp folder?
To: [EMAIL PROTECTED]
Date: Monday, September 13, 2010, 5:01 PM
Set the following parameter in your workers' mapred-site.xml, and change the
value to what you want:
<description> To set the value of tmp directory for map and reduce tasks.
If the value is an absolute path, it is directly assigned. Otherwise, it is
prepended with task's working directory. The java tasks are executed with
option -Djava.io.tmpdir='the absolute path of the tmp dir'. Pipes and
streaming are set with environment variable,
TMPDIR='the absolute path of the tmp dir'
In core-site.xml, set the hadoop.tmp.dir property the same way as above. I am
not sure how they all interrelate.
There is also a tmpdir variable for the JVM, I am not sure what reads that. I
just set them all the same.
----- Original Message ----
> From: jiang licht <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Mon, September 13, 2010 5:23:12 PM
> Subject: specify temp folder?
> It seems that pig generates some folders/files under "/tmp" in HDFS for pig
>jobs. I remember that hadoop saves such intermediate results (map output, etc.)
>in non-hdfs folders, which are specified in mapred-site.xml. So, is there a way
>to tell pig to store such data to a non-hdfs folder?