Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> How to get/operate the InputFileName in pig 0.8.1

Copy link to this message
How to get/operate the InputFileName in pig 0.8.1

I hava some files in the hdfs://path/load/ like this:
These files are generate by other M/R jobs. The files are only contains one
column, and the number in the file name between 'file_' and '_00001' is a
I want to add the id into its input format like this(I think I should to
write a LoadFunc to get the id):
a = load '/path/load/' as com.company.pig.GetIDFromFileName();
dump a;
//here the parameter 'a' will have two columns:one is the origin column and
the other is the id.

And my question are these:
1, Does there have the existing func that I can get the id from the file
2, I think the method in pig 0.6.0 can help me:
org.apache.pig.impl.io.BufferedPositionedInputStream, long,
 fileName, BufferedPositionedInputStream<http://pig.apache.org/docs/r0.6.0/api/org/apache/pig/impl/io/BufferedPositionedInputStream.html>
long offset, long end)
          Specifies a portion of an InputStream to read tuples.
but I can't find the same method in pig 0.8.1.
Which method can I use to operate the input file in the pig 0.8.1 API?

Thanks very much.