Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> more reduce tasks


Copy link to this message
-
more reduce tasks
 Hello,
I'd like to use more than one reduce task with Hadoop Streaming and I'd
like to have only one result. Is it possible? Or should I run one more job
to merge the result? And is it the same with non-streaming jobs? Below you
see, I have 5 results for mapred.reduce.tasks=5.

$ hadoop jar
/packages/run.64/hadoop-0.20.2-cdh3u1/contrib/streaming/hadoop-streaming-0.20.2-cdh3u1.jar
-D mapred.reduce.tasks=5 -mapper /bin/cat -reducer /tmp/wcc -file /tmp/wcc
-file /bin/cat -input /user/hadoopnlp/1gb -output 1gb.wc
.
.
.
13/01/03 22:00:03 INFO streaming.StreamJob:  map 100%  reduce 100%
13/01/03 22:00:07 INFO streaming.StreamJob: Job complete:
job_201301021717_0038
13/01/03 22:00:07 INFO streaming.StreamJob: Output: 1gb.wc
$ hadoop dfs -cat 1gb.wc/part-*
472173052
165736187
201719914
184376668
163872819
$

where /tmp/wcc contains
#!/bin/bash
wc -c

Thanks for any answer,
 Pavel Hančar
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB