Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - How to get/operate the InputFileName in pig 0.8.1

Copy link to this message
Re: How to get/operate the InputFileName in pig 0.8.1
Jameson Li 2011-06-16, 13:09
Great. Depend on the
wiki:http://wiki.apache.org/pig/PigStorageWithInputPath and
the setting:-Dpig.noSplitCombination=true, I can get the filename in the

But I have another problem.
I modify the UDF code and ant it and generate the newest jar file(I am sure
the jar file has updated)
pig -x local
register /home/user/project/lib/myUDF.jar
a = load 'aaa';
b = foreach a generate com.company.pig.myUDF();
dump b;

I found that the result has been using the old jar file and UDF class, and I
think UDF classes has been caced somewhere.

Am I right?
And how to using the really newest jar file after re-compile?

Thanks very much.

2011/6/15 Daniel Dai <[EMAIL PROTECTED]>

>  Check http://wiki.apache.org/pig/PigStorageWithInputPath, also you will
> need to disable split combination: -Dpig.noSplitCombination=true
> Daniel
> On 06/13/2011 04:07 AM, Jameson Li wrote:
> Hi,
> I hava some files in the hdfs://path/load/ like this:
> file_29_00001
> file_47_00001
> file_16_00001
> ...
> These files are generate by other M/R jobs. The files are only contains one
> column, and the number in the file name between 'file_' and '_00001' is a
> id.
> I want to add the id into its input format like this(I think I should to
> write a LoadFunc to get the id):
> a = load '/path/load/' as com.company.pig.
> GetIDFromFileName();
> dump a;
> //here the parameter 'a' will have two columns:one is the origin column and
> the other is the id.
> And my question are these:
> 1, Does there have the existing func that I can get the id from the file
> name?
> 2, I think the method in pig 0.6.0 can help me:
> *bindTo<http://pig.apache.org/docs/r0.6.0/api/org/apache/pig/builtin/PigStorage.html#bindTo(java.lang.String,
> org.apache.pig.impl.io.BufferedPositionedInputStream, long,
> long)> <http://pig.apache.org/docs/r0.6.0/api/org/apache/pig/builtin/PigStorage.html#bindTo(java.lang.String,org.apache.pig.impl.io.BufferedPositionedInputStream,long,long)>*(String<http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html?is-external=true> <http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html?is-external=true>
>  fileName, BufferedPositionedInputStream<http://pig.apache.org/docs/r0.6.0/api/org/apache/pig/impl/io/BufferedPositionedInputStream.html> <http://pig.apache.org/docs/r0.6.0/api/org/apache/pig/impl/io/BufferedPositionedInputStream.html>
> in,
> long offset, long end)
>           Specifies a portion of an InputStream to read tuples.
> but I can't find the same method in pig 0.8.1.
> Which method can I use to operate the input file in the pig 0.8.1 API?
> Thanks very much.