Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> How to make a MapReduce job with no input?


Copy link to this message
-
Re: How to make a MapReduce job with no input?
Below is some code I use.  Basically, the number of iterations
is the number of fake records to supply to each mapper.  You control
the number of mappers via the jobconf.
>  public static class EmptySplit implements InputSplit {
>     public void write(DataOutput out) throws IOException { }
>     public void readFields(DataInput in) throws IOException { }
>     public long getLength() { return 0L; }
>     public String[] getLocations() { return new String[0]; }
>   }
>
>   public static class FFTBenchInputFormat extends Configured
>       implements InputFormat<IntWritable,IntWritable> {
>     public InputSplit[] getSplits(JobConf conf, int numSplits) {
>       InputSplit[] ret = new InputSplit[numSplits];
>       for (int i = 0; i < numSplits; ++i) {
>         ret[i] = new EmptySplit();
>       }
>       return ret;
>     }
>     public RecordReader<IntWritable,IntWritable> getRecordReader(
>         InputSplit ignored, JobConf conf, Reporter reporter)
>         throws IOException {
>       final int size = conf.getInt("fftbench.map.size", 1);
>       if (size < 0) throw new IOException("Invalid map size: " + size);
>       final int iterations = conf.getInt("fftbench.map.iterations", 1);
>       if (iterations < 0) throw new IOException("Invalid map iterations: " + size);
>     return new RecordReader<IntWritable,IntWritable>() {
>         private int records = 0;
>         private int emitCount = 0;
>
>         public boolean next(IntWritable key, IntWritable value)
>             throws IOException {
>           key.set(size);
>           int emit = emitCount++;
>           value.set(emit);
>           return records++ < iterations;
>         }
>         public IntWritable createKey() { return new IntWritable(); }
>         public IntWritable createValue() { return new IntWritable(); }
>         public long getPos() throws IOException { return records; }
>         public void close() throws IOException { }
>         public float getProgress() throws IOException {
>           return records / ((float)iterations);
>         }
>       };
>     }
>   }
On 2/28/2013 4:16 PM, Mike Spreitzer wrote:
> I am using the mapred API of Hadoop 1.0.  I want to make a job that does
> not really depend on any input (the job conf supplies all the info
> needed in Mapper).  What is a good way to do this?
>
> What I have done so far is write a job in which MyMapper.configure(..)
> reads all the real input from the JobConf, and MyMapper.map(..) ignores
> the given key and value, writing the output implied by the JobConf.  I
> set the InputFormat to TextInputFormat and the input paths to be a list
> of one filename; the named file contains one line of text (the word
> "one"), terminated by a newline.  When I run this job (on Linux,
> hadoop-1.0.0), I find it has two map tasks --- one reads the first two
> bytes of my non-input file, and other reads the last two bytes of my
> non-input file!  How can I make a job with just one map task?
>
> Thanks,
> Mike

--
========= mailto:[EMAIL PROTECTED] ===========David W. Boyd
Vice President, Operations
Lorenz Research, a Data Tactics corporation
7901 Jones Branch, Suite 610
Mclean, VA 22102
office:   +1-703-506-3735, ext 308
fax:     +1-703-506-6703
cell:     +1-703-402-7908
============== http://www.lorenzresearch.com/ ===========

The information contained in this message may be privileged
and/or confidential and protected from disclosure.
If the reader of this message is not the intended recipient
or an employee or agent responsible for delivering this message
to the intended recipient, you are hereby notified that any
dissemination, distribution or copying of this communication
is strictly prohibited.  If you have received this communication
in error, please notify the sender immediately by replying to
this message and deleting the material from any computer.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB