Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: more reduce tasks


Copy link to this message
-
Re: more reduce tasks

Is it that you want the parallelism but a single final output? Assuming your first job's reducers generate a small output, another stage is the way to go. If not, second stage won't help. What exactly are your objectives?

Thanks,
+Vinod

On Jan 3, 2013, at 1:11 PM, Pavel Hančar wrote:

>   Hello,
> I'd like to use more than one reduce task with Hadoop Streaming and I'd like to have only one result. Is it possible? Or should I run one more job to merge the result? And is it the same with non-streaming jobs? Below you see, I have 5 results for mapred.reduce.tasks=5.
>
> $ hadoop jar /packages/run.64/hadoop-0.20.2-cdh3u1/contrib/streaming/hadoop-streaming-0.20.2-cdh3u1.jar  -D mapred.reduce.tasks=5 -mapper /bin/cat -reducer /tmp/wcc -file /tmp/wcc -file /bin/cat -input /user/hadoopnlp/1gb -output 1gb.wc
> .
> .
> .
> 13/01/03 22:00:03 INFO streaming.StreamJob:  map 100%  reduce 100%
> 13/01/03 22:00:07 INFO streaming.StreamJob: Job complete: job_201301021717_0038
> 13/01/03 22:00:07 INFO streaming.StreamJob: Output: 1gb.wc
> $ hadoop dfs -cat 1gb.wc/part-*
> 472173052
> 165736187
> 201719914
> 184376668
> 163872819
> $
>
> where /tmp/wcc contains
> #!/bin/bash
> wc -c
>
> Thanks for any answer,
>  Pavel Hančar

+
Pavel Hančar 2013-01-04, 08:35
+
Harsh J 2013-01-05, 07:57
+
Pavel Hančar 2013-01-05, 14:32
+
Chen He 2013-01-04, 04:55
+
bejoy.hadoop@... 2013-01-04, 05:24
+
Chen He 2013-01-04, 05:32
+
Robert Dyer 2013-01-04, 05:55
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB