Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> The name of the current input file during a map


Copy link to this message
-
The name of the current input file during a map
Hello,
I have a set of input files part-r-* which I will pass through another
map(no reduce).  the part-r-* files consist of key, values, keys being
small, values fairly large(MB's)

I would like to index these, i.e run a map, whose output is key and
/filename/ i.e to which part-r-* file the particular key belongs, so
that if i need them again I can just access that file.

Q: In the map stage,how do I retrieve the name of the file being
processed?  I'd rather not use the MapFileOutputFormat.

Hadoop 0.21

Regards
Saptarshi
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB