I hava some files in the hdfs://path/load/ like this:
These files are generate by other M/R jobs. The files are only contains one
column, and the number in the file name between 'file_' and '_00001' is a
I want to add the id into its input format like this(I think I should to
write a LoadFunc to get the id):
a = load '/path/load/' as com.company.pig.GetIDFromFileName();
//here the parameter 'a' will have two columns:one is the origin column and
the other is the id.
And my question are these:
1, Does there have the existing func that I can get the id from the file
2, I think the method in pig 0.6.0 can help me:
long offset, long end)
Specifies a portion of an InputStream to read tuples.
but I can't find the same method in pig 0.8.1.
Which method can I use to operate the input file in the pig 0.8.1 API?
Thanks very much.