Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> specify temp folder?

Copy link to this message
Re: specify temp folder?
The temp files between MR jobs are stored on dfs. This has to be on dfs as these are inputs to the next MR job.
On 9/13/10 3:15 PM, "jiang licht" <[EMAIL PROTECTED]> wrote:

All these settings should point to non-dfs folders. But I saw some pig jobs save intermediate outputs to "/tmp" in HDFS (maybe just in "interactive mode", not sure if I remember correctly, will check this), which means they get replicated and use much more space.



--- On Mon, 9/13/10, Mr. Jan Walter <[EMAIL PROTECTED]> wrote:

From: Mr. Jan Walter <[EMAIL PROTECTED]>
Subject: Re: specify temp folder?
Date: Monday, September 13, 2010, 5:01 PM

Set the following parameter in your workers' mapred-site.xml, and change the
value to what you want:

  <description> To set the value of tmp directory for map and reduce tasks.
  If the value is an absolute path, it is directly assigned. Otherwise, it is
  prepended with task's working directory. The java tasks are executed with
  option -Djava.io.tmpdir='the absolute path of the tmp dir'. Pipes and
  streaming are set with environment variable,
   TMPDIR='the absolute path of the tmp dir'
In core-site.xml, set the hadoop.tmp.dir property the same way as above. I am
not sure how they all interrelate.
There is also a tmpdir variable for the JVM, I am not sure what reads that. I
just set them all the same.

----- Original Message ----
> From: jiang licht <[EMAIL PROTECTED]>
> Sent: Mon, September 13, 2010 5:23:12 PM
> Subject: specify temp folder?
> It seems that pig generates some folders/files under "/tmp" in HDFS for pig
>jobs. I remember that hadoop saves such intermediate results (map output, etc.)
>in non-hdfs folders, which are specified in mapred-site.xml. So, is there a way
>to tell pig to store such data to a non-hdfs folder?
> Thanks,
> Michael