Thanks.  map.input.file is exactly what I need.

  One more question.  Is there a way to ignore a file in an input path?
So, for example, if the data in hadoop is stored in a directory
structure /<date>/<machine>.txt.  So let's say Dec 1, 2008, I have a
file from machine a and b, I would have the following directory


   What I'd like to do is have a job that, depending on the
configuration, would either process all files or files for a given
machine only ( say a, but not b ).  

   Is that possible to do or am I trying to do something that's using
Hadoop in a way that it's not intended to be used?  I looked briefly at
MultipleInputs which seems to be able to handle different input paths,
but not handle a single input path in different ways depending on

   Thanks again.


-----Original Message-----
From: Devaraj Das [mailto:[EMAIL PROTECTED]]
Sent: Sunday, December 07, 2008 12:11 PM
Subject: Re: Can mapper get access to filename being processed?
On 12/7/08 11:32 PM, "Andy Sautins" <[EMAIL PROTECTED]> wrote:
> if I'm just not looking at the right place or if I'm thinking about
> problem in the wrong way.  Any insight would be appreciated.
>    Let's say I have a directory of files that contains a combination
> different file types.  The MapReduce job needs to process all files in
> the directory but generates different key/value pairs depending on the
> file being processed.  What I'd like to do is use the filename to
> identify the file type being processed and use that information in the
> map job.  What it seems like what I'd want is the map job to have
> to the filename of the input file split being processed.  I haven't
> able to find out if that is available to a derived class of
> MapReduceBase.  
That's map.input.file available in the map via JobConf. The mapper class
to override the implementation of configure in MapReduceBase and get the
filename via JobConf.get("map.input.file"). Store that in some field
variable of your mapper class. You can then inspect that in your map

>    Does what I'm trying to do make sense or is there a better way of
> processing a job like the one I'm describing?
Look at MultipleInputs class (in the mapred.lib directory). That could
>    Thank you
>    Andy
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB