-Re: large intermediate outputs
Allen Wittenauer 2011-01-04, 04:32
On Jan 3, 2011, at 5:11 AM, Debbie Fu wrote:
> I think it will cause a disk fill-up, too. Is there any mechanism in Hadoop
> that handles this situation?
Not in a way that saves the job.
> If my local disk stores too much chunk data,
> and spare little space for intermediate output, and all nodes are in this
> situation that we can't schedule the task on another node that could have
> the space for intermediate output, so what does the hadoop do ? Does the job
> simply fail?
> Can I set a remote disk in mapred.local.dir?
You can point it to an NFS mount, but that'd be suicide.
Best bet is to break the job up into multiple jobs or reduce the input per task depending upon the situation if using compression as Harsh mentioned is not acceptable.