Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - can i specify no shuffle and/or no sort in the reducer and no disk space left IOException when there is DFS space remaining


+
Jane Wayne 2012-03-08, 04:28
+
Jie Li 2012-03-08, 05:02
Copy link to this message
-
Re: can i specify no shuffle and/or no sort in the reducer and no disk space left IOException when there is DFS space remaining
Jane Wayne 2012-03-08, 05:06
Jie,

so if if i set the number of reduce tasks to 0, do i need to specify the
reducer (or should i set it null)? if i don't specify the reducer, and just
have a mapper, where do all the mapper output key-value pair go to? do they
get serialized to disk/HDFS automagically?

On Thu, Mar 8, 2012 at 12:02 AM, Jie Li <[EMAIL PROTECTED]> wrote:

> Hi Jane,
>
> The default Reducer (IdentityReducer) would simply read/write everything
> that goes through it. By default Shuffling would also happen and the map
> output data is partitioned by the HashPartitioner.
>
> If you don't need the shuffle/reduce, you need to explicitly set the number
> of the reduce tasks to zero via JobConf's setNumReduceTasks(int num).
>
> Hope that helps.
>
> Jie
>
> On Wed, Mar 7, 2012 at 11:28 PM, Jane Wayne <[EMAIL PROTECTED]
> >wrote:
>
> > i have a Mapper and Reducer as a part of a job. all my data
> transformation
> > occurs in the mapper, and there is absolutely nothing that needs to be
> done
> > in the reducer. when i set the reducer on the Job, i simply use the
> > Reducer.class.
> >
> > i notice that after the mapper tasks have reached 100%, then the time
> until
> > reducing starts is very long. when reducing starts then i get a
> > java.io.IOException: No space left on deviceFSError. i checked the dfs
> > health (via web page), and i still have 42.41% DFS remaining. why does
> this
> > occur? i see that eventually 4 attempts are made to call Reducer,
> however,
> > they all end up with the IOException mentioned. at the bottom is an
> output.
> > notice that the percentage goes up then back down to 0% before the
> > IOException.
> >
> > also, i want to know if i can just subclass Reducer or do something about
> > shuffling and sorting as these steps are not important. i just want each
> > record emitted from the Mapper to go straight to disk. is it possible to
> do
> > this without going through Reducer? i am thinking this is part of the
> > problem for taking so long between 100% map and the first sign of reduce.
> >
> > EXAMPLE OUTPUT
> >
> > 12/03/07 22:38:45 INFO mapred.JobClient:  map 98% reduce 0%
> > 12/03/07 22:39:18 INFO mapred.JobClient:  map 99% reduce 0%
> > 12/03/07 22:39:43 INFO mapred.JobClient:  map 100% reduce 0%
> > 12/03/07 22:58:14 INFO mapred.JobClient:  map 100% reduce 1%
> > 12/03/07 22:58:23 INFO mapred.JobClient:  map 100% reduce 3%
> > 12/03/07 22:58:38 INFO mapred.JobClient:  map 100% reduce 6%
> > 12/03/07 22:58:57 INFO mapred.JobClient:  map 100% reduce 7%
> > 12/03/07 22:59:21 INFO mapred.JobClient:  map 100% reduce 9%
> > 12/03/07 23:00:00 INFO mapred.JobClient:  map 100% reduce 10%
> > 12/03/07 23:00:09 INFO mapred.JobClient:  map 100% reduce 12%
> > 12/03/07 23:00:58 INFO mapred.JobClient:  map 100% reduce 0%
> > 12/03/07 23:01:00 INFO mapred.JobClient: Task Id :
> > attempt_201203071517_0043_r_000000_0, Status : FAILED
> > FSError: java.io.IOException: No space left on deviceFSError:
> > java.io.IOException: No space left on deviceFSError: java.io.IOException:
> > No space left on deviceFSError: java.io.IOException: No space left on
> > deviceFSError: java.io.IOException: No space left on deviceFSError:
> > java.io.IOException: No space left on device
> > attempt_201203071517_0043_r_000000_0: log4j:ERROR Failed to flush writer,
> > attempt_201203071517_0043_r_000000_0: java.io.IOException: No space left
> on
> > device
> > 12/03/07 23:01:31 INFO mapred.JobClient:  map 100% reduce 1%
> > 12/03/07 23:01:34 INFO mapred.JobClient:  map 100% reduce 3%
> > 12/03/07 23:01:37 INFO mapred.JobClient:  map 100% reduce 4%
> > 12/03/07 23:01:49 INFO mapred.JobClient:  map 100% reduce 6%
> > 12/03/07 23:01:55 INFO mapred.JobClient:  map 100% reduce 7%
> > 12/03/07 23:02:19 INFO mapred.JobClient:  map 100% reduce 9%
> > 12/03/07 23:02:52 INFO mapred.JobClient:  map 100% reduce 0%
> > 12/03/07 23:02:54 INFO mapred.JobClient: Task Id :
> > attempt_201203071517_0043_r_000000_1, Status : FAILED
> > FSError: java.io.IOException
+
Jie Li 2012-03-08, 05:19
+
Jane Wayne 2012-03-08, 05:28