Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> can i specify no shuffle and/or no sort in the reducer and no disk space left IOException when there is DFS space remaining


+
Jane Wayne 2012-03-08, 04:28
+
Jie Li 2012-03-08, 05:02
+
Jane Wayne 2012-03-08, 05:06
+
Jie Li 2012-03-08, 05:19
Copy link to this message
-
Re: can i specify no shuffle and/or no sort in the reducer and no disk space left IOException when there is DFS space remaining
thanks Jie. that worked. instead of part-r-00000, i just get part-m-00000.
so, that's no problem.

however, i'm going to see if i still get that IOException complaining about
no more free disk space.

On Thu, Mar 8, 2012 at 12:19 AM, Jie Li <[EMAIL PROTECTED]> wrote:

> You don't need to specify the reducer at all.
>
> Yeah the map output will go to HDFS directly. It's called map-only job.
>
> Jie
>
> On Thursday, March 8, 2012, Jane Wayne wrote:
>
> > Jie,
> >
> > so if if i set the number of reduce tasks to 0, do i need to specify the
> > reducer (or should i set it null)? if i don't specify the reducer, and
> just
> > have a mapper, where do all the mapper output key-value pair go to? do
> they
> > get serialized to disk/HDFS automagically?
> >
> > On Thu, Mar 8, 2012 at 12:02 AM, Jie Li <[EMAIL PROTECTED]<javascript:;>>
> > wrote:
> >
> > > Hi Jane,
> > >
> > > The default Reducer (IdentityReducer) would simply read/write
> everything
> > > that goes through it. By default Shuffling would also happen and the
> map
> > > output data is partitioned by the HashPartitioner.
> > >
> > > If you don't need the shuffle/reduce, you need to explicitly set the
> > number
> > > of the reduce tasks to zero via JobConf's setNumReduceTasks(int num).
> > >
> > > Hope that helps.
> > >
> > > Jie
> > >
> > > On Wed, Mar 7, 2012 at 11:28 PM, Jane Wayne <[EMAIL PROTECTED]
> <javascript:;>
> > > >wrote:
> > >
> > > > i have a Mapper and Reducer as a part of a job. all my data
> > > transformation
> > > > occurs in the mapper, and there is absolutely nothing that needs to
> be
> > > done
> > > > in the reducer. when i set the reducer on the Job, i simply use the
> > > > Reducer.class.
> > > >
> > > > i notice that after the mapper tasks have reached 100%, then the time
> > > until
> > > > reducing starts is very long. when reducing starts then i get a
> > > > java.io.IOException: No space left on deviceFSError. i checked the
> dfs
> > > > health (via web page), and i still have 42.41% DFS remaining. why
> does
> > > this
> > > > occur? i see that eventually 4 attempts are made to call Reducer,
> > > however,
> > > > they all end up with the IOException mentioned. at the bottom is an
> > > output.
> > > > notice that the percentage goes up then back down to 0% before the
> > > > IOException.
> > > >
> > > > also, i want to know if i can just subclass Reducer or do something
> > about
> > > > shuffling and sorting as these steps are not important. i just want
> > each
> > > > record emitted from the Mapper to go straight to disk. is it possible
> > to
> > > do
> > > > this without going through Reducer? i am thinking this is part of the
> > > > problem for taking so long between 100% map and the first sign of
> > reduce.
> > > >
> > > > EXAMPLE OUTPUT
> > > >
> > > > 12/03/07 22:38:45 INFO mapred.JobClient:  map 98% reduce 0%
> > > > 12/03/07 22:39:18 INFO mapred.JobClient:  map 99% reduce 0%
> > > > 12/03/07 22:39:43 INFO mapred.JobClient:  map 100% reduce 0%
> > > > 12/03/07 22:58:14 INFO mapred.JobClient:  map 100% reduce 1%
> > > > 12/03/07 22:58:23 INFO mapred.JobClient:  map 100% reduce 3%
> > > > 12/03/07 22:58:38 INFO mapred.JobClient:  map 100% reduce 6%
> > > > 12/03/07 22:58:57 INFO mapred.JobClient:  map 100% reduce 7%
> > > > 12/03/07 22:59:21 INFO mapred.JobClient:  map 100% reduce 9%
> > > > 12/03/07 23:00:00 INFO mapred.JobClient:  map 100% reduce 10%
> > > > 12/03/07 23:00:09 INFO mapred.JobClient:  map 100% reduce 12%
> > > > 12/03/07 23:00:58 INFO mapred.JobClient:  map 100% reduce 0%
> > > > 12/03/07 23:01:00 INFO mapred.JobClient: Task Id :
> > > > attempt_201203071517_0043_r_000000_0, Status : FAILED
> > > > FSError: java.io.IOException: No space left on deviceFSError:
> > > > java.io.IOException: No space left on deviceFSError:
> > java.io.IOException:
> > > > No space left on deviceFSError: java.io.IOException: No space left on
> > > > deviceFSError: java.io.IOException: No space left on deviceFSError:
> > > > java.io.IOException: No space left on device
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB