Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - How to make a MapReduce job with no input?


+
Mike Spreitzer 2013-02-28, 21:16
+
Harsh J 2013-03-01, 04:15
+
David Boyd 2013-03-01, 01:39
+
Mike Spreitzer 2013-02-28, 21:25
+
Jeff Kubina 2013-03-01, 01:41
Copy link to this message
-
Re: How to make a MapReduce job with no input?
Edward Capriolo 2013-03-01, 01:46
I made a https://github.com/edwardcapriolo/DualInputFormat for hive.
Always returns 1 split with 1 run. You can write the same type of
thing to create N splits.

On Thu, Feb 28, 2013 at 8:41 PM, Jeff Kubina <[EMAIL PROTECTED]> wrote:
> Mike,
>
> To do this for the more general case of creating N map jobs with each job
> receiving the one record <i, n>, where i ranges from 0 to n-1, I wrote an
> InputFormat, InputSplit, and RecordReader Hadoop class. The sample code is
> here. I think I wrote those for Hadoop 0.19, so they may need some tweaking
> for subsequent versions.
>
> Jeff
>
> On Thu, Feb 28, 2013 at 4:25 PM, Mike Spreitzer <[EMAIL PROTECTED]> wrote:
>>
>> On closer inspection, I see that of my two tasks: the first processes 1
>> input record and the other processes 0 input records.  So I think this
>> solution is correct.  But perhaps it is not the most direct way to get the
>> job done?
>>
>>
>>
>>
>> From:        Mike Spreitzer/Watson/IBM@IBMUS
>> To:        [EMAIL PROTECTED],
>> Date:        02/28/2013 04:18 PM
>> Subject:        How to make a MapReduce job with no input?
>> ________________________________
>>
>>
>>
>> I am using the mapred API of Hadoop 1.0.  I want to make a job that does
>> not really depend on any input (the job conf supplies all the info needed in
>> Mapper).  What is a good way to do this?
>>
>> What I have done so far is write a job in which MyMapper.configure(..)
>> reads all the real input from the JobConf, and MyMapper.map(..) ignores the
>> given key and value, writing the output implied by the JobConf.  I set the
>> InputFormat to TextInputFormat and the input paths to be a list of one
>> filename; the named file contains one line of text (the word "one"),
>> terminated by a newline.  When I run this job (on Linux, hadoop-1.0.0), I
>> find it has two map tasks --- one reads the first two bytes of my non-input
>> file, and other reads the last two bytes of my non-input file!  How can I
>> make a job with just one map task?
>>
>> Thanks,
>> Mike
>
>