Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS, mail # user - more reduce tasks

Copy link to this message
more reduce tasks
Pavel Hančar 2013-01-03, 21:11
I'd like to use more than one reduce task with Hadoop Streaming and I'd
like to have only one result. Is it possible? Or should I run one more job
to merge the result? And is it the same with non-streaming jobs? Below you
see, I have 5 results for mapred.reduce.tasks=5.

$ hadoop jar
-D mapred.reduce.tasks=5 -mapper /bin/cat -reducer /tmp/wcc -file /tmp/wcc
-file /bin/cat -input /user/hadoopnlp/1gb -output 1gb.wc
13/01/03 22:00:03 INFO streaming.StreamJob:  map 100%  reduce 100%
13/01/03 22:00:07 INFO streaming.StreamJob: Job complete:
13/01/03 22:00:07 INFO streaming.StreamJob: Output: 1gb.wc
$ hadoop dfs -cat 1gb.wc/part-*

where /tmp/wcc contains
wc -c

Thanks for any answer,
 Pavel Hančar