Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> can i specify no shuffle and/or no sort in the reducer and no disk space left IOException when there is DFS space remaining


Copy link to this message
-
Re: can i specify no shuffle and/or no sort in the reducer and no disk space left IOException when there is DFS space remaining
Hi Jane,

The default Reducer (IdentityReducer) would simply read/write everything
that goes through it. By default Shuffling would also happen and the map
output data is partitioned by the HashPartitioner.

If you don't need the shuffle/reduce, you need to explicitly set the number
of the reduce tasks to zero via JobConf's setNumReduceTasks(int num).

Hope that helps.

Jie

On Wed, Mar 7, 2012 at 11:28 PM, Jane Wayne <[EMAIL PROTECTED]>wrote:

> i have a Mapper and Reducer as a part of a job. all my data transformation
> occurs in the mapper, and there is absolutely nothing that needs to be done
> in the reducer. when i set the reducer on the Job, i simply use the
> Reducer.class.
>
> i notice that after the mapper tasks have reached 100%, then the time until
> reducing starts is very long. when reducing starts then i get a
> java.io.IOException: No space left on deviceFSError. i checked the dfs
> health (via web page), and i still have 42.41% DFS remaining. why does this
> occur? i see that eventually 4 attempts are made to call Reducer, however,
> they all end up with the IOException mentioned. at the bottom is an output.
> notice that the percentage goes up then back down to 0% before the
> IOException.
>
> also, i want to know if i can just subclass Reducer or do something about
> shuffling and sorting as these steps are not important. i just want each
> record emitted from the Mapper to go straight to disk. is it possible to do
> this without going through Reducer? i am thinking this is part of the
> problem for taking so long between 100% map and the first sign of reduce.
>
> EXAMPLE OUTPUT
>
> 12/03/07 22:38:45 INFO mapred.JobClient:  map 98% reduce 0%
> 12/03/07 22:39:18 INFO mapred.JobClient:  map 99% reduce 0%
> 12/03/07 22:39:43 INFO mapred.JobClient:  map 100% reduce 0%
> 12/03/07 22:58:14 INFO mapred.JobClient:  map 100% reduce 1%
> 12/03/07 22:58:23 INFO mapred.JobClient:  map 100% reduce 3%
> 12/03/07 22:58:38 INFO mapred.JobClient:  map 100% reduce 6%
> 12/03/07 22:58:57 INFO mapred.JobClient:  map 100% reduce 7%
> 12/03/07 22:59:21 INFO mapred.JobClient:  map 100% reduce 9%
> 12/03/07 23:00:00 INFO mapred.JobClient:  map 100% reduce 10%
> 12/03/07 23:00:09 INFO mapred.JobClient:  map 100% reduce 12%
> 12/03/07 23:00:58 INFO mapred.JobClient:  map 100% reduce 0%
> 12/03/07 23:01:00 INFO mapred.JobClient: Task Id :
> attempt_201203071517_0043_r_000000_0, Status : FAILED
> FSError: java.io.IOException: No space left on deviceFSError:
> java.io.IOException: No space left on deviceFSError: java.io.IOException:
> No space left on deviceFSError: java.io.IOException: No space left on
> deviceFSError: java.io.IOException: No space left on deviceFSError:
> java.io.IOException: No space left on device
> attempt_201203071517_0043_r_000000_0: log4j:ERROR Failed to flush writer,
> attempt_201203071517_0043_r_000000_0: java.io.IOException: No space left on
> device
> 12/03/07 23:01:31 INFO mapred.JobClient:  map 100% reduce 1%
> 12/03/07 23:01:34 INFO mapred.JobClient:  map 100% reduce 3%
> 12/03/07 23:01:37 INFO mapred.JobClient:  map 100% reduce 4%
> 12/03/07 23:01:49 INFO mapred.JobClient:  map 100% reduce 6%
> 12/03/07 23:01:55 INFO mapred.JobClient:  map 100% reduce 7%
> 12/03/07 23:02:19 INFO mapred.JobClient:  map 100% reduce 9%
> 12/03/07 23:02:52 INFO mapred.JobClient:  map 100% reduce 0%
> 12/03/07 23:02:54 INFO mapred.JobClient: Task Id :
> attempt_201203071517_0043_r_000000_1, Status : FAILED
> FSError: java.io.IOException: No space left on deviceFSError:
> java.io.IOException: No space left on deviceFSError: java.io.IOException:
> No space left on device
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB