|
|
-
can i specify no shuffle and/or no sort in the reducer and no disk space left IOException when there is DFS space remaining
Jane Wayne 2012-03-08, 04:28
i have a Mapper and Reducer as a part of a job. all my data transformation occurs in the mapper, and there is absolutely nothing that needs to be done in the reducer. when i set the reducer on the Job, i simply use the Reducer.class.
i notice that after the mapper tasks have reached 100%, then the time until reducing starts is very long. when reducing starts then i get a java.io.IOException: No space left on deviceFSError. i checked the dfs health (via web page), and i still have 42.41% DFS remaining. why does this occur? i see that eventually 4 attempts are made to call Reducer, however, they all end up with the IOException mentioned. at the bottom is an output. notice that the percentage goes up then back down to 0% before the IOException.
also, i want to know if i can just subclass Reducer or do something about shuffling and sorting as these steps are not important. i just want each record emitted from the Mapper to go straight to disk. is it possible to do this without going through Reducer? i am thinking this is part of the problem for taking so long between 100% map and the first sign of reduce.
EXAMPLE OUTPUT
12/03/07 22:38:45 INFO mapred.JobClient: map 98% reduce 0% 12/03/07 22:39:18 INFO mapred.JobClient: map 99% reduce 0% 12/03/07 22:39:43 INFO mapred.JobClient: map 100% reduce 0% 12/03/07 22:58:14 INFO mapred.JobClient: map 100% reduce 1% 12/03/07 22:58:23 INFO mapred.JobClient: map 100% reduce 3% 12/03/07 22:58:38 INFO mapred.JobClient: map 100% reduce 6% 12/03/07 22:58:57 INFO mapred.JobClient: map 100% reduce 7% 12/03/07 22:59:21 INFO mapred.JobClient: map 100% reduce 9% 12/03/07 23:00:00 INFO mapred.JobClient: map 100% reduce 10% 12/03/07 23:00:09 INFO mapred.JobClient: map 100% reduce 12% 12/03/07 23:00:58 INFO mapred.JobClient: map 100% reduce 0% 12/03/07 23:01:00 INFO mapred.JobClient: Task Id : attempt_201203071517_0043_r_000000_0, Status : FAILED FSError: java.io.IOException: No space left on deviceFSError: java.io.IOException: No space left on deviceFSError: java.io.IOException: No space left on deviceFSError: java.io.IOException: No space left on deviceFSError: java.io.IOException: No space left on deviceFSError: java.io.IOException: No space left on device attempt_201203071517_0043_r_000000_0: log4j:ERROR Failed to flush writer, attempt_201203071517_0043_r_000000_0: java.io.IOException: No space left on device 12/03/07 23:01:31 INFO mapred.JobClient: map 100% reduce 1% 12/03/07 23:01:34 INFO mapred.JobClient: map 100% reduce 3% 12/03/07 23:01:37 INFO mapred.JobClient: map 100% reduce 4% 12/03/07 23:01:49 INFO mapred.JobClient: map 100% reduce 6% 12/03/07 23:01:55 INFO mapred.JobClient: map 100% reduce 7% 12/03/07 23:02:19 INFO mapred.JobClient: map 100% reduce 9% 12/03/07 23:02:52 INFO mapred.JobClient: map 100% reduce 0% 12/03/07 23:02:54 INFO mapred.JobClient: Task Id : attempt_201203071517_0043_r_000000_1, Status : FAILED FSError: java.io.IOException: No space left on deviceFSError: java.io.IOException: No space left on deviceFSError: java.io.IOException: No space left on device
+
Jane Wayne 2012-03-08, 04:28
-
Re: can i specify no shuffle and/or no sort in the reducer and no disk space left IOException when there is DFS space remaining
Jie Li 2012-03-08, 05:02
Hi Jane,
The default Reducer (IdentityReducer) would simply read/write everything that goes through it. By default Shuffling would also happen and the map output data is partitioned by the HashPartitioner.
If you don't need the shuffle/reduce, you need to explicitly set the number of the reduce tasks to zero via JobConf's setNumReduceTasks(int num).
Hope that helps.
Jie
On Wed, Mar 7, 2012 at 11:28 PM, Jane Wayne <[EMAIL PROTECTED]>wrote:
> i have a Mapper and Reducer as a part of a job. all my data transformation > occurs in the mapper, and there is absolutely nothing that needs to be done > in the reducer. when i set the reducer on the Job, i simply use the > Reducer.class. > > i notice that after the mapper tasks have reached 100%, then the time until > reducing starts is very long. when reducing starts then i get a > java.io.IOException: No space left on deviceFSError. i checked the dfs > health (via web page), and i still have 42.41% DFS remaining. why does this > occur? i see that eventually 4 attempts are made to call Reducer, however, > they all end up with the IOException mentioned. at the bottom is an output. > notice that the percentage goes up then back down to 0% before the > IOException. > > also, i want to know if i can just subclass Reducer or do something about > shuffling and sorting as these steps are not important. i just want each > record emitted from the Mapper to go straight to disk. is it possible to do > this without going through Reducer? i am thinking this is part of the > problem for taking so long between 100% map and the first sign of reduce. > > EXAMPLE OUTPUT > > 12/03/07 22:38:45 INFO mapred.JobClient: map 98% reduce 0% > 12/03/07 22:39:18 INFO mapred.JobClient: map 99% reduce 0% > 12/03/07 22:39:43 INFO mapred.JobClient: map 100% reduce 0% > 12/03/07 22:58:14 INFO mapred.JobClient: map 100% reduce 1% > 12/03/07 22:58:23 INFO mapred.JobClient: map 100% reduce 3% > 12/03/07 22:58:38 INFO mapred.JobClient: map 100% reduce 6% > 12/03/07 22:58:57 INFO mapred.JobClient: map 100% reduce 7% > 12/03/07 22:59:21 INFO mapred.JobClient: map 100% reduce 9% > 12/03/07 23:00:00 INFO mapred.JobClient: map 100% reduce 10% > 12/03/07 23:00:09 INFO mapred.JobClient: map 100% reduce 12% > 12/03/07 23:00:58 INFO mapred.JobClient: map 100% reduce 0% > 12/03/07 23:01:00 INFO mapred.JobClient: Task Id : > attempt_201203071517_0043_r_000000_0, Status : FAILED > FSError: java.io.IOException: No space left on deviceFSError: > java.io.IOException: No space left on deviceFSError: java.io.IOException: > No space left on deviceFSError: java.io.IOException: No space left on > deviceFSError: java.io.IOException: No space left on deviceFSError: > java.io.IOException: No space left on device > attempt_201203071517_0043_r_000000_0: log4j:ERROR Failed to flush writer, > attempt_201203071517_0043_r_000000_0: java.io.IOException: No space left on > device > 12/03/07 23:01:31 INFO mapred.JobClient: map 100% reduce 1% > 12/03/07 23:01:34 INFO mapred.JobClient: map 100% reduce 3% > 12/03/07 23:01:37 INFO mapred.JobClient: map 100% reduce 4% > 12/03/07 23:01:49 INFO mapred.JobClient: map 100% reduce 6% > 12/03/07 23:01:55 INFO mapred.JobClient: map 100% reduce 7% > 12/03/07 23:02:19 INFO mapred.JobClient: map 100% reduce 9% > 12/03/07 23:02:52 INFO mapred.JobClient: map 100% reduce 0% > 12/03/07 23:02:54 INFO mapred.JobClient: Task Id : > attempt_201203071517_0043_r_000000_1, Status : FAILED > FSError: java.io.IOException: No space left on deviceFSError: > java.io.IOException: No space left on deviceFSError: java.io.IOException: > No space left on device >
+
Jie Li 2012-03-08, 05:02
-
Re: can i specify no shuffle and/or no sort in the reducer and no disk space left IOException when there is DFS space remaining
Jane Wayne 2012-03-08, 05:06
Jie,
so if if i set the number of reduce tasks to 0, do i need to specify the reducer (or should i set it null)? if i don't specify the reducer, and just have a mapper, where do all the mapper output key-value pair go to? do they get serialized to disk/HDFS automagically?
On Thu, Mar 8, 2012 at 12:02 AM, Jie Li <[EMAIL PROTECTED]> wrote:
> Hi Jane, > > The default Reducer (IdentityReducer) would simply read/write everything > that goes through it. By default Shuffling would also happen and the map > output data is partitioned by the HashPartitioner. > > If you don't need the shuffle/reduce, you need to explicitly set the number > of the reduce tasks to zero via JobConf's setNumReduceTasks(int num). > > Hope that helps. > > Jie > > On Wed, Mar 7, 2012 at 11:28 PM, Jane Wayne <[EMAIL PROTECTED] > >wrote: > > > i have a Mapper and Reducer as a part of a job. all my data > transformation > > occurs in the mapper, and there is absolutely nothing that needs to be > done > > in the reducer. when i set the reducer on the Job, i simply use the > > Reducer.class. > > > > i notice that after the mapper tasks have reached 100%, then the time > until > > reducing starts is very long. when reducing starts then i get a > > java.io.IOException: No space left on deviceFSError. i checked the dfs > > health (via web page), and i still have 42.41% DFS remaining. why does > this > > occur? i see that eventually 4 attempts are made to call Reducer, > however, > > they all end up with the IOException mentioned. at the bottom is an > output. > > notice that the percentage goes up then back down to 0% before the > > IOException. > > > > also, i want to know if i can just subclass Reducer or do something about > > shuffling and sorting as these steps are not important. i just want each > > record emitted from the Mapper to go straight to disk. is it possible to > do > > this without going through Reducer? i am thinking this is part of the > > problem for taking so long between 100% map and the first sign of reduce. > > > > EXAMPLE OUTPUT > > > > 12/03/07 22:38:45 INFO mapred.JobClient: map 98% reduce 0% > > 12/03/07 22:39:18 INFO mapred.JobClient: map 99% reduce 0% > > 12/03/07 22:39:43 INFO mapred.JobClient: map 100% reduce 0% > > 12/03/07 22:58:14 INFO mapred.JobClient: map 100% reduce 1% > > 12/03/07 22:58:23 INFO mapred.JobClient: map 100% reduce 3% > > 12/03/07 22:58:38 INFO mapred.JobClient: map 100% reduce 6% > > 12/03/07 22:58:57 INFO mapred.JobClient: map 100% reduce 7% > > 12/03/07 22:59:21 INFO mapred.JobClient: map 100% reduce 9% > > 12/03/07 23:00:00 INFO mapred.JobClient: map 100% reduce 10% > > 12/03/07 23:00:09 INFO mapred.JobClient: map 100% reduce 12% > > 12/03/07 23:00:58 INFO mapred.JobClient: map 100% reduce 0% > > 12/03/07 23:01:00 INFO mapred.JobClient: Task Id : > > attempt_201203071517_0043_r_000000_0, Status : FAILED > > FSError: java.io.IOException: No space left on deviceFSError: > > java.io.IOException: No space left on deviceFSError: java.io.IOException: > > No space left on deviceFSError: java.io.IOException: No space left on > > deviceFSError: java.io.IOException: No space left on deviceFSError: > > java.io.IOException: No space left on device > > attempt_201203071517_0043_r_000000_0: log4j:ERROR Failed to flush writer, > > attempt_201203071517_0043_r_000000_0: java.io.IOException: No space left > on > > device > > 12/03/07 23:01:31 INFO mapred.JobClient: map 100% reduce 1% > > 12/03/07 23:01:34 INFO mapred.JobClient: map 100% reduce 3% > > 12/03/07 23:01:37 INFO mapred.JobClient: map 100% reduce 4% > > 12/03/07 23:01:49 INFO mapred.JobClient: map 100% reduce 6% > > 12/03/07 23:01:55 INFO mapred.JobClient: map 100% reduce 7% > > 12/03/07 23:02:19 INFO mapred.JobClient: map 100% reduce 9% > > 12/03/07 23:02:52 INFO mapred.JobClient: map 100% reduce 0% > > 12/03/07 23:02:54 INFO mapred.JobClient: Task Id : > > attempt_201203071517_0043_r_000000_1, Status : FAILED > > FSError: java.io.IOException
+
Jane Wayne 2012-03-08, 05:06
-
Re: can i specify no shuffle and/or no sort in the reducer and no disk space left IOException when there is DFS space remaining
Jie Li 2012-03-08, 05:19
You don't need to specify the reducer at all.
Yeah the map output will go to HDFS directly. It's called map-only job.
Jie
On Thursday, March 8, 2012, Jane Wayne wrote:
> Jie, > > so if if i set the number of reduce tasks to 0, do i need to specify the > reducer (or should i set it null)? if i don't specify the reducer, and just > have a mapper, where do all the mapper output key-value pair go to? do they > get serialized to disk/HDFS automagically? > > On Thu, Mar 8, 2012 at 12:02 AM, Jie Li <[EMAIL PROTECTED] <javascript:;>> > wrote: > > > Hi Jane, > > > > The default Reducer (IdentityReducer) would simply read/write everything > > that goes through it. By default Shuffling would also happen and the map > > output data is partitioned by the HashPartitioner. > > > > If you don't need the shuffle/reduce, you need to explicitly set the > number > > of the reduce tasks to zero via JobConf's setNumReduceTasks(int num). > > > > Hope that helps. > > > > Jie > > > > On Wed, Mar 7, 2012 at 11:28 PM, Jane Wayne <[EMAIL PROTECTED]<javascript:;> > > >wrote: > > > > > i have a Mapper and Reducer as a part of a job. all my data > > transformation > > > occurs in the mapper, and there is absolutely nothing that needs to be > > done > > > in the reducer. when i set the reducer on the Job, i simply use the > > > Reducer.class. > > > > > > i notice that after the mapper tasks have reached 100%, then the time > > until > > > reducing starts is very long. when reducing starts then i get a > > > java.io.IOException: No space left on deviceFSError. i checked the dfs > > > health (via web page), and i still have 42.41% DFS remaining. why does > > this > > > occur? i see that eventually 4 attempts are made to call Reducer, > > however, > > > they all end up with the IOException mentioned. at the bottom is an > > output. > > > notice that the percentage goes up then back down to 0% before the > > > IOException. > > > > > > also, i want to know if i can just subclass Reducer or do something > about > > > shuffling and sorting as these steps are not important. i just want > each > > > record emitted from the Mapper to go straight to disk. is it possible > to > > do > > > this without going through Reducer? i am thinking this is part of the > > > problem for taking so long between 100% map and the first sign of > reduce. > > > > > > EXAMPLE OUTPUT > > > > > > 12/03/07 22:38:45 INFO mapred.JobClient: map 98% reduce 0% > > > 12/03/07 22:39:18 INFO mapred.JobClient: map 99% reduce 0% > > > 12/03/07 22:39:43 INFO mapred.JobClient: map 100% reduce 0% > > > 12/03/07 22:58:14 INFO mapred.JobClient: map 100% reduce 1% > > > 12/03/07 22:58:23 INFO mapred.JobClient: map 100% reduce 3% > > > 12/03/07 22:58:38 INFO mapred.JobClient: map 100% reduce 6% > > > 12/03/07 22:58:57 INFO mapred.JobClient: map 100% reduce 7% > > > 12/03/07 22:59:21 INFO mapred.JobClient: map 100% reduce 9% > > > 12/03/07 23:00:00 INFO mapred.JobClient: map 100% reduce 10% > > > 12/03/07 23:00:09 INFO mapred.JobClient: map 100% reduce 12% > > > 12/03/07 23:00:58 INFO mapred.JobClient: map 100% reduce 0% > > > 12/03/07 23:01:00 INFO mapred.JobClient: Task Id : > > > attempt_201203071517_0043_r_000000_0, Status : FAILED > > > FSError: java.io.IOException: No space left on deviceFSError: > > > java.io.IOException: No space left on deviceFSError: > java.io.IOException: > > > No space left on deviceFSError: java.io.IOException: No space left on > > > deviceFSError: java.io.IOException: No space left on deviceFSError: > > > java.io.IOException: No space left on device > > > attempt_201203071517_0043_r_000000_0: log4j:ERROR Failed to flush > writer, > > > attempt_201203071517_0043_r_000000_0: java.io.IOException: No space > left > > on > > > device > > > 12/03/07 23:01:31 INFO mapred.JobClient: map 100% reduce 1% > > > 12/03/07 23:01:34 INFO mapred.JobClient: map 100% reduce 3% > > > 12/03/07 23:01:37 INFO mapred.JobClient: map 100% reduce 4% > > > 12/03/07 23:01:49 INFO mapred.JobClient: map 100% reduce 6%
+
Jie Li 2012-03-08, 05:19
-
Re: can i specify no shuffle and/or no sort in the reducer and no disk space left IOException when there is DFS space remaining
Jane Wayne 2012-03-08, 05:28
thanks Jie. that worked. instead of part-r-00000, i just get part-m-00000. so, that's no problem.
however, i'm going to see if i still get that IOException complaining about no more free disk space.
On Thu, Mar 8, 2012 at 12:19 AM, Jie Li <[EMAIL PROTECTED]> wrote:
> You don't need to specify the reducer at all. > > Yeah the map output will go to HDFS directly. It's called map-only job. > > Jie > > On Thursday, March 8, 2012, Jane Wayne wrote: > > > Jie, > > > > so if if i set the number of reduce tasks to 0, do i need to specify the > > reducer (or should i set it null)? if i don't specify the reducer, and > just > > have a mapper, where do all the mapper output key-value pair go to? do > they > > get serialized to disk/HDFS automagically? > > > > On Thu, Mar 8, 2012 at 12:02 AM, Jie Li <[EMAIL PROTECTED]<javascript:;>> > > wrote: > > > > > Hi Jane, > > > > > > The default Reducer (IdentityReducer) would simply read/write > everything > > > that goes through it. By default Shuffling would also happen and the > map > > > output data is partitioned by the HashPartitioner. > > > > > > If you don't need the shuffle/reduce, you need to explicitly set the > > number > > > of the reduce tasks to zero via JobConf's setNumReduceTasks(int num). > > > > > > Hope that helps. > > > > > > Jie > > > > > > On Wed, Mar 7, 2012 at 11:28 PM, Jane Wayne <[EMAIL PROTECTED] > <javascript:;> > > > >wrote: > > > > > > > i have a Mapper and Reducer as a part of a job. all my data > > > transformation > > > > occurs in the mapper, and there is absolutely nothing that needs to > be > > > done > > > > in the reducer. when i set the reducer on the Job, i simply use the > > > > Reducer.class. > > > > > > > > i notice that after the mapper tasks have reached 100%, then the time > > > until > > > > reducing starts is very long. when reducing starts then i get a > > > > java.io.IOException: No space left on deviceFSError. i checked the > dfs > > > > health (via web page), and i still have 42.41% DFS remaining. why > does > > > this > > > > occur? i see that eventually 4 attempts are made to call Reducer, > > > however, > > > > they all end up with the IOException mentioned. at the bottom is an > > > output. > > > > notice that the percentage goes up then back down to 0% before the > > > > IOException. > > > > > > > > also, i want to know if i can just subclass Reducer or do something > > about > > > > shuffling and sorting as these steps are not important. i just want > > each > > > > record emitted from the Mapper to go straight to disk. is it possible > > to > > > do > > > > this without going through Reducer? i am thinking this is part of the > > > > problem for taking so long between 100% map and the first sign of > > reduce. > > > > > > > > EXAMPLE OUTPUT > > > > > > > > 12/03/07 22:38:45 INFO mapred.JobClient: map 98% reduce 0% > > > > 12/03/07 22:39:18 INFO mapred.JobClient: map 99% reduce 0% > > > > 12/03/07 22:39:43 INFO mapred.JobClient: map 100% reduce 0% > > > > 12/03/07 22:58:14 INFO mapred.JobClient: map 100% reduce 1% > > > > 12/03/07 22:58:23 INFO mapred.JobClient: map 100% reduce 3% > > > > 12/03/07 22:58:38 INFO mapred.JobClient: map 100% reduce 6% > > > > 12/03/07 22:58:57 INFO mapred.JobClient: map 100% reduce 7% > > > > 12/03/07 22:59:21 INFO mapred.JobClient: map 100% reduce 9% > > > > 12/03/07 23:00:00 INFO mapred.JobClient: map 100% reduce 10% > > > > 12/03/07 23:00:09 INFO mapred.JobClient: map 100% reduce 12% > > > > 12/03/07 23:00:58 INFO mapred.JobClient: map 100% reduce 0% > > > > 12/03/07 23:01:00 INFO mapred.JobClient: Task Id : > > > > attempt_201203071517_0043_r_000000_0, Status : FAILED > > > > FSError: java.io.IOException: No space left on deviceFSError: > > > > java.io.IOException: No space left on deviceFSError: > > java.io.IOException: > > > > No space left on deviceFSError: java.io.IOException: No space left on > > > > deviceFSError: java.io.IOException: No space left on deviceFSError: > > > > java.io.IOException: No space left on device
+
Jane Wayne 2012-03-08, 05:28
|
|