Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> How to make a MapReduce job with no input?


Copy link to this message
-
Re: How to make a MapReduce job with no input?
The default # of map tasks is set to 2 (via mapred.map.tasks from
mapred-default.xml) - which explains your 2-map run for even one line
of text.

For running with no inputs, take a look at Sleep Job's EmptySplits
technique on trunk:
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/SleepJob.java?view=markup
(~line 70)

On Fri, Mar 1, 2013 at 2:46 AM, Mike Spreitzer <[EMAIL PROTECTED]> wrote:
> I am using the mapred API of Hadoop 1.0.  I want to make a job that does not
> really depend on any input (the job conf supplies all the info needed in
> Mapper).  What is a good way to do this?
>
> What I have done so far is write a job in which MyMapper.configure(..) reads
> all the real input from the JobConf, and MyMapper.map(..) ignores the given
> key and value, writing the output implied by the JobConf.  I set the
> InputFormat to TextInputFormat and the input paths to be a list of one
> filename; the named file contains one line of text (the word "one"),
> terminated by a newline.  When I run this job (on Linux, hadoop-1.0.0), I
> find it has two map tasks --- one reads the first two bytes of my non-input
> file, and other reads the last two bytes of my non-input file!  How can I
> make a job with just one map task?
>
> Thanks,
> Mike

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB