Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - How to get/operate the InputFileName in pig 0.8.1


Copy link to this message
-
How to get/operate the InputFileName in pig 0.8.1
Jameson Li 2011-06-13, 11:07
Hi,

I hava some files in the hdfs://path/load/ like this:
file_29_00001
file_47_00001
file_16_00001
...
These files are generate by other M/R jobs. The files are only contains one
column, and the number in the file name between 'file_' and '_00001' is a
id.
I want to add the id into its input format like this(I think I should to
write a LoadFunc to get the id):
a = load '/path/load/' as com.company.pig.GetIDFromFileName();
dump a;
//here the parameter 'a' will have two columns:one is the origin column and
the other is the id.

And my question are these:
1, Does there have the existing func that I can get the id from the file
name?
2, I think the method in pig 0.6.0 can help me:
*bindTo<http://pig.apache.org/docs/r0.6.0/api/org/apache/pig/builtin/PigStorage.html#bindTo(java.lang.String,
org.apache.pig.impl.io.BufferedPositionedInputStream, long,
long)>*(String<http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html?is-external=true>
 fileName, BufferedPositionedInputStream<http://pig.apache.org/docs/r0.6.0/api/org/apache/pig/impl/io/BufferedPositionedInputStream.html>
in,
long offset, long end)
          Specifies a portion of an InputStream to read tuples.
but I can't find the same method in pig 0.8.1.
Which method can I use to operate the input file in the pig 0.8.1 API?

Thanks very much.