Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> How to get/operate the InputFileName in pig 0.8.1


Copy link to this message
-
How to get/operate the InputFileName in pig 0.8.1
Hi,

I hava some files in the hdfs://path/load/ like this:
file_29_00001
file_47_00001
file_16_00001
...
These files are generate by other M/R jobs. The files are only contains one
column, and the number in the file name between 'file_' and '_00001' is a
id.
I want to add the id into its input format like this(I think I should to
write a LoadFunc to get the id):
a = load '/path/load/' as com.company.pig.GetIDFromFileName();
dump a;
//here the parameter 'a' will have two columns:one is the origin column and
the other is the id.

And my question are these:
1, Does there have the existing func that I can get the id from the file
name?
2, I think the method in pig 0.6.0 can help me:
*bindTo<http://pig.apache.org/docs/r0.6.0/api/org/apache/pig/builtin/PigStorage.html#bindTo(java.lang.String,
org.apache.pig.impl.io.BufferedPositionedInputStream, long,
long)>*(String<http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html?is-external=true>
 fileName, BufferedPositionedInputStream<http://pig.apache.org/docs/r0.6.0/api/org/apache/pig/impl/io/BufferedPositionedInputStream.html>
in,
long offset, long end)
          Specifies a portion of an InputStream to read tuples.
but I can't find the same method in pig 0.8.1.
Which method can I use to operate the input file in the pig 0.8.1 API?

Thanks very much.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB