Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: more reduce tasks


+
Vinod Kumar Vavilapalli 2013-01-04, 04:45
Copy link to this message
-
Re: more reduce tasks
 Hello,
thank you for the answer. Exactly: I want the parallelism but a single
final output. What do you mean by "another stage"? I thought I should
setmapred.reduce.tasks large enough and hadoop will run the reducers
in so
many rounds it will be optimal. But it isn't the case.
  When I tried to run the classical WordCount example, and try to set this
by JobConf.setNumReduceTasks(int n), it seemed to me I had the final output
(there were no word duplicates for the normal words -- only some for
strange words). So why the hadoop doesn't run the final reduce in my simple
streaming example?
  Thank you,
  Pavel Hančar

2013/1/4 Vinod Kumar Vavilapalli <[EMAIL PROTECTED]>

>
> Is it that you want the parallelism but a single final output? Assuming
> your first job's reducers generate a small output, another stage is the way
> to go. If not, second stage won't help. What exactly are your objectives?
>
> Thanks,
> +Vinod
>
> On Jan 3, 2013, at 1:11 PM, Pavel Hančar wrote:
>
>   Hello,
> I'd like to use more than one reduce task with Hadoop Streaming and I'd
> like to have only one result. Is it possible? Or should I run one more job
> to merge the result? And is it the same with non-streaming jobs? Below you
> see, I have 5 results for mapred.reduce.tasks=5.
>
> $ hadoop jar
> /packages/run.64/hadoop-0.20.2-cdh3u1/contrib/streaming/hadoop-streaming-0.20.2-cdh3u1.jar
> -D mapred.reduce.tasks=5 -mapper /bin/cat -reducer /tmp/wcc -file /tmp/wcc
> -file /bin/cat -input /user/hadoopnlp/1gb -output 1gb.wc
> .
> .
> .
> 13/01/03 22:00:03 INFO streaming.StreamJob:  map 100%  reduce 100%
> 13/01/03 22:00:07 INFO streaming.StreamJob: Job complete:
> job_201301021717_0038
> 13/01/03 22:00:07 INFO streaming.StreamJob: Output: 1gb.wc
> $ hadoop dfs -cat 1gb.wc/part-*
> 472173052
> 165736187
> 201719914
> 184376668
> 163872819
> $
>
> where /tmp/wcc contains
> #!/bin/bash
> wc -c
>
> Thanks for any answer,
>  Pavel Hančar
>
>
>
+
Harsh J 2013-01-05, 07:57
+
Pavel Hančar 2013-01-05, 14:32
+
Chen He 2013-01-04, 04:55
+
bejoy.hadoop@... 2013-01-04, 05:24
+
Chen He 2013-01-04, 05:32
+
Robert Dyer 2013-01-04, 05:55