Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> can i specify no shuffle and/or no sort in the reducer and no disk space left IOException when there is DFS space remaining


Copy link to this message
-
Re: can i specify no shuffle and/or no sort in the reducer and no disk space left IOException when there is DFS space remaining
You don't need to specify the reducer at all.

Yeah the map output will go to HDFS directly. It's called map-only job.

Jie

On Thursday, March 8, 2012, Jane Wayne wrote:

> Jie,
>
> so if if i set the number of reduce tasks to 0, do i need to specify the
> reducer (or should i set it null)? if i don't specify the reducer, and just
> have a mapper, where do all the mapper output key-value pair go to? do they
> get serialized to disk/HDFS automagically?
>
> On Thu, Mar 8, 2012 at 12:02 AM, Jie Li <[EMAIL PROTECTED] <javascript:;>>
> wrote:
>
> > Hi Jane,
> >
> > The default Reducer (IdentityReducer) would simply read/write everything
> > that goes through it. By default Shuffling would also happen and the map
> > output data is partitioned by the HashPartitioner.
> >
> > If you don't need the shuffle/reduce, you need to explicitly set the
> number
> > of the reduce tasks to zero via JobConf's setNumReduceTasks(int num).
> >
> > Hope that helps.
> >
> > Jie
> >
> > On Wed, Mar 7, 2012 at 11:28 PM, Jane Wayne <[EMAIL PROTECTED]<javascript:;>
> > >wrote:
> >
> > > i have a Mapper and Reducer as a part of a job. all my data
> > transformation
> > > occurs in the mapper, and there is absolutely nothing that needs to be
> > done
> > > in the reducer. when i set the reducer on the Job, i simply use the
> > > Reducer.class.
> > >
> > > i notice that after the mapper tasks have reached 100%, then the time
> > until
> > > reducing starts is very long. when reducing starts then i get a
> > > java.io.IOException: No space left on deviceFSError. i checked the dfs
> > > health (via web page), and i still have 42.41% DFS remaining. why does
> > this
> > > occur? i see that eventually 4 attempts are made to call Reducer,
> > however,
> > > they all end up with the IOException mentioned. at the bottom is an
> > output.
> > > notice that the percentage goes up then back down to 0% before the
> > > IOException.
> > >
> > > also, i want to know if i can just subclass Reducer or do something
> about
> > > shuffling and sorting as these steps are not important. i just want
> each
> > > record emitted from the Mapper to go straight to disk. is it possible
> to
> > do
> > > this without going through Reducer? i am thinking this is part of the
> > > problem for taking so long between 100% map and the first sign of
> reduce.
> > >
> > > EXAMPLE OUTPUT
> > >
> > > 12/03/07 22:38:45 INFO mapred.JobClient:  map 98% reduce 0%
> > > 12/03/07 22:39:18 INFO mapred.JobClient:  map 99% reduce 0%
> > > 12/03/07 22:39:43 INFO mapred.JobClient:  map 100% reduce 0%
> > > 12/03/07 22:58:14 INFO mapred.JobClient:  map 100% reduce 1%
> > > 12/03/07 22:58:23 INFO mapred.JobClient:  map 100% reduce 3%
> > > 12/03/07 22:58:38 INFO mapred.JobClient:  map 100% reduce 6%
> > > 12/03/07 22:58:57 INFO mapred.JobClient:  map 100% reduce 7%
> > > 12/03/07 22:59:21 INFO mapred.JobClient:  map 100% reduce 9%
> > > 12/03/07 23:00:00 INFO mapred.JobClient:  map 100% reduce 10%
> > > 12/03/07 23:00:09 INFO mapred.JobClient:  map 100% reduce 12%
> > > 12/03/07 23:00:58 INFO mapred.JobClient:  map 100% reduce 0%
> > > 12/03/07 23:01:00 INFO mapred.JobClient: Task Id :
> > > attempt_201203071517_0043_r_000000_0, Status : FAILED
> > > FSError: java.io.IOException: No space left on deviceFSError:
> > > java.io.IOException: No space left on deviceFSError:
> java.io.IOException:
> > > No space left on deviceFSError: java.io.IOException: No space left on
> > > deviceFSError: java.io.IOException: No space left on deviceFSError:
> > > java.io.IOException: No space left on device
> > > attempt_201203071517_0043_r_000000_0: log4j:ERROR Failed to flush
> writer,
> > > attempt_201203071517_0043_r_000000_0: java.io.IOException: No space
> left
> > on
> > > device
> > > 12/03/07 23:01:31 INFO mapred.JobClient:  map 100% reduce 1%
> > > 12/03/07 23:01:34 INFO mapred.JobClient:  map 100% reduce 3%
> > > 12/03/07 23:01:37 INFO mapred.JobClient:  map 100% reduce 4%
> > > 12/03/07 23:01:49 INFO mapred.JobClient:  map 100% reduce 6%
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB