Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> How to make a MapReduce job with no input?


+
Mike Spreitzer 2013-02-28, 21:16
+
Harsh J 2013-03-01, 04:15
+
David Boyd 2013-03-01, 01:39
+
Mike Spreitzer 2013-02-28, 21:25
+
Jeff Kubina 2013-03-01, 01:41
Copy link to this message
-
Re: How to make a MapReduce job with no input?
I made a https://github.com/edwardcapriolo/DualInputFormat for hive.
Always returns 1 split with 1 run. You can write the same type of
thing to create N splits.

On Thu, Feb 28, 2013 at 8:41 PM, Jeff Kubina <[EMAIL PROTECTED]> wrote:
> Mike,
>
> To do this for the more general case of creating N map jobs with each job
> receiving the one record <i, n>, where i ranges from 0 to n-1, I wrote an
> InputFormat, InputSplit, and RecordReader Hadoop class. The sample code is
> here. I think I wrote those for Hadoop 0.19, so they may need some tweaking
> for subsequent versions.
>
> Jeff
>
> On Thu, Feb 28, 2013 at 4:25 PM, Mike Spreitzer <[EMAIL PROTECTED]> wrote:
>>
>> On closer inspection, I see that of my two tasks: the first processes 1
>> input record and the other processes 0 input records.  So I think this
>> solution is correct.  But perhaps it is not the most direct way to get the
>> job done?
>>
>>
>>
>>
>> From:        Mike Spreitzer/Watson/IBM@IBMUS
>> To:        [EMAIL PROTECTED],
>> Date:        02/28/2013 04:18 PM
>> Subject:        How to make a MapReduce job with no input?
>> ________________________________
>>
>>
>>
>> I am using the mapred API of Hadoop 1.0.  I want to make a job that does
>> not really depend on any input (the job conf supplies all the info needed in
>> Mapper).  What is a good way to do this?
>>
>> What I have done so far is write a job in which MyMapper.configure(..)
>> reads all the real input from the JobConf, and MyMapper.map(..) ignores the
>> given key and value, writing the output implied by the JobConf.  I set the
>> InputFormat to TextInputFormat and the input paths to be a list of one
>> filename; the named file contains one line of text (the word "one"),
>> terminated by a newline.  When I run this job (on Linux, hadoop-1.0.0), I
>> find it has two map tasks --- one reads the first two bytes of my non-input
>> file, and other reads the last two bytes of my non-input file!  How can I
>> make a job with just one map task?
>>
>> Thanks,
>> Mike
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB