Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Set variables in mapper


Copy link to this message
-
Re: Set variables in mapper
Hi,

It would also be worthwhile to look at the Tool interface
(http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Tool),
which is used by example programs in the MapReduce examples as well.
This would allow any arguments to be passed using the
-Dvar.name=var.value convention on command line.

Thanks
Hemanth

On Mon, Aug 2, 2010 at 10:33 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> And since it is an integer you're looking for, use the utility methods
> JobConf.setInt and JobConf.getInt:
>
> Integer N = Integer.parseInt(args[2]);
> JobConf.setInt("your.pack.some.name", N);
>
> And in the Mapper's "@Override void configure(JobConf conf)", do:
> conf.getInt("your.pack.some.name", 1 /* Or other default value */);
>
> On Mon, Aug 2, 2010 at 9:53 PM, Edward Capriolo <[EMAIL PROTECTED]> wrote:
>> On Mon, Aug 2, 2010 at 12:17 PM, Erik Test <[EMAIL PROTECTED]> wrote:
>>> Hi,
>>>
>>> I'm trying to set a variable in my mapper class by reading an argument from
>>> the command line and then passing the entry to the mapper from main. Is this
>>> possible?
>>>
>>>  public static void main(String[] args) throws Exception
>>>  {
>>>    JobConf conf = new JobConf(DistanceCalc2.class);
>>>    conf.setJobName("Calculate Distances");
>>>
>>>    conf.setOutputKeyClass(Text.class);
>>>    conf.setOutputValueClass(DoubleWritable.class);
>>>
>>>    conf.setMapperClass(Map.class);
>>>    //conf.setReducerClass(Reduce.class);
>>>
>>>    conf.setInputFormat(TextInputFormat.class);
>>>    conf.setOutputFormat(TextOutputFormat.class);
>>>
>>>    FileInputFormat.setInputPaths(conf, new Path(args[0]));
>>>    FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>>>
>>>    Map.setN(args[2]);
>>>
>>>    JobClient.runJob(conf);
>>>  }//main
>>>
>>>
>>>  public static class Map extends MapReduceBase
>>>    implements Mapper<LongWritable, Text,
>>>      Text, DoubleWritable>
>>>        {
>>>               ...
>>>               private static int N;
>>>
>>>               ...
>>>
>>>               public void map(LongWritable key, Text value,
>>>                 OutputCollector<Text, DoubleWritable> output,
>>>                  Reporter reporter) throws IOException
>>>                {
>>>                    ....
>>>                    dim = tokens.length / N;
>>>                    ...
>>>                }
>>>
>>>               public static void setN(String newN)
>>>               {
>>>                  N = Integer.parseInt(newN);
>>>               }
>>>        }
>>>
>>> I've tried the code above but I get an error saying that I'm dividing by
>>> zero. Obviously, the argument I enter for N isn't being set as specified.
>>> Erik
>>>
>>
>> You can pass variables to the Job using the JobConf class.
>>
>> In your Driver class:
>> jobConf.set("clone_path", clonePath);
>>
>> Then in your mapper / reducer override configure:
>>
>>  private JobConf jobConf;
>>  public void configure(JobConf jobConf) {
>>        super.configure(jobConf);
>>        this.jobConf=jobConf;
>>  }
>>
>
>
>
> --
> Harsh J
> www.harshj.com
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB