Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> more reduce tasks


Copy link to this message
-
more reduce tasks
 Hello,
I'd like to use more than one reduce task with Hadoop Streaming and I'd
like to have only one result. Is it possible? Or should I run one more job
to merge the result? And is it the same with non-streaming jobs? Below you
see, I have 5 results for mapred.reduce.tasks=5.

$ hadoop jar
/packages/run.64/hadoop-0.20.2-cdh3u1/contrib/streaming/hadoop-streaming-0.20.2-cdh3u1.jar
-D mapred.reduce.tasks=5 -mapper /bin/cat -reducer /tmp/wcc -file /tmp/wcc
-file /bin/cat -input /user/hadoopnlp/1gb -output 1gb.wc
.
.
.
13/01/03 22:00:03 INFO streaming.StreamJob:  map 100%  reduce 100%
13/01/03 22:00:07 INFO streaming.StreamJob: Job complete:
job_201301021717_0038
13/01/03 22:00:07 INFO streaming.StreamJob: Output: 1gb.wc
$ hadoop dfs -cat 1gb.wc/part-*
472173052
165736187
201719914
184376668
163872819
$

where /tmp/wcc contains
#!/bin/bash
wc -c

Thanks for any answer,
 Pavel Hančar