|
|
-
The name of the current input file during a map
Saptarshi Guha 2009-11-26, 07:05
Hello, I have a set of input files part-r-* which I will pass through another map(no reduce). the part-r-* files consist of key, values, keys being small, values fairly large(MB's)
I would like to index these, i.e run a map, whose output is key and /filename/ i.e to which part-r-* file the particular key belongs, so that if i need them again I can just access that file.
Q: In the map stage,how do I retrieve the name of the file being processed? I'd rather not use the MapFileOutputFormat.
Hadoop 0.21
Regards Saptarshi
-
Re: The name of the current input file during a map
Amogh Vasekar 2009-11-26, 07:10
Conf.get(map.input.file) is what you need.
Amogh On 11/26/09 12:35 PM, "Saptarshi Guha" <[EMAIL PROTECTED]> wrote:
Hello, I have a set of input files part-r-* which I will pass through another map(no reduce). the part-r-* files consist of key, values, keys being small, values fairly large(MB's)
I would like to index these, i.e run a map, whose output is key and /filename/ i.e to which part-r-* file the particular key belongs, so that if i need them again I can just access that file.
Q: In the map stage,how do I retrieve the name of the file being processed? I'd rather not use the MapFileOutputFormat.
Hadoop 0.21
Regards Saptarshi
-
Re: The name of the current input file during a map
Saptarshi Guha 2009-11-26, 07:13
Thank you. Regards Saptarshi
On Thu, Nov 26, 2009 at 2:10 AM, Amogh Vasekar <[EMAIL PROTECTED]> wrote: > Conf.get(map.input.file) is what you need. > > Amogh > > > On 11/26/09 12:35 PM, "Saptarshi Guha" <[EMAIL PROTECTED]> wrote: > > Hello, > I have a set of input files part-r-* which I will pass through another > map(no reduce). the part-r-* files consist of key, values, keys being > small, values fairly large(MB's) > > I would like to index these, i.e run a map, whose output is key and > /filename/ i.e to which part-r-* file the particular key belongs, so > that if i need them again I can just access that file. > > Q: In the map stage,how do I retrieve the name of the file being > processed? I'd rather not use the MapFileOutputFormat. > > Hadoop 0.21 > > Regards > Saptarshi > >
-
Re: The name of the current input file during a map
Saptarshi Guha 2009-11-26, 07:27
Hello again, I'm using Hadoop 0.21 and its context object e.g
public void setup(Context context) { Configuration cfg = context.getConfiguration(); System.out.println("mapred.input.file="+cfg.get("mapred.input.file"));
displays null, so maybe this fell out by mistake in the api change? Regards Saptarshi On Thu, Nov 26, 2009 at 2:13 AM, Saptarshi Guha <[EMAIL PROTECTED]> wrote: > Thank you. > Regards > Saptarshi > > On Thu, Nov 26, 2009 at 2:10 AM, Amogh Vasekar <[EMAIL PROTECTED]> wrote: >> Conf.get(map.input.file) is what you need. >> >> Amogh >> >> >> On 11/26/09 12:35 PM, "Saptarshi Guha" <[EMAIL PROTECTED]> wrote: >> >> Hello, >> I have a set of input files part-r-* which I will pass through another >> map(no reduce). the part-r-* files consist of key, values, keys being >> small, values fairly large(MB's) >> >> I would like to index these, i.e run a map, whose output is key and >> /filename/ i.e to which part-r-* file the particular key belongs, so >> that if i need them again I can just access that file. >> >> Q: In the map stage,how do I retrieve the name of the file being >> processed? I'd rather not use the MapFileOutputFormat. >> >> Hadoop 0.21 >> >> Regards >> Saptarshi >> >> >
-
Re: The name of the current input file during a map
Amogh Vasekar 2009-11-26, 08:23
-"mapred.input.file" +"map.input.file" Should work
Amogh
On 11/26/09 12:57 PM, "Saptarshi Guha" <[EMAIL PROTECTED]> wrote:
Hello again, I'm using Hadoop 0.21 and its context object e.g
public void setup(Context context) { Configuration cfg = context.getConfiguration(); System.out.println("mapred.input.file="+cfg.get("mapred.input.file"));
displays null, so maybe this fell out by mistake in the api change? Regards Saptarshi On Thu, Nov 26, 2009 at 2:13 AM, Saptarshi Guha <[EMAIL PROTECTED]> wrote: > Thank you. > Regards > Saptarshi > > On Thu, Nov 26, 2009 at 2:10 AM, Amogh Vasekar <[EMAIL PROTECTED]> wrote: >> Conf.get(map.input.file) is what you need. >> >> Amogh >> >> >> On 11/26/09 12:35 PM, "Saptarshi Guha" <[EMAIL PROTECTED]> wrote: >> >> Hello, >> I have a set of input files part-r-* which I will pass through another >> map(no reduce). the part-r-* files consist of key, values, keys being >> small, values fairly large(MB's) >> >> I would like to index these, i.e run a map, whose output is key and >> /filename/ i.e to which part-r-* file the particular key belongs, so >> that if i need them again I can just access that file. >> >> Q: In the map stage,how do I retrieve the name of the file being >> processed? I'd rather not use the MapFileOutputFormat. >> >> Hadoop 0.21 >> >> Regards >> Saptarshi >> >> >
-
Re: The name of the current input file during a map
Owen O'Malley 2009-11-26, 16:13
On Nov 25, 2009, at 11:27 PM, Saptarshi Guha wrote:
> I'm using Hadoop 0.21 and its context object
In the new API you can re-write that as:
((FIleSplit) context.getInputSplit()).getPath()
-- Owen
|
|