Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - How to get/operate the InputFileName in pig 0.8.1


Copy link to this message
-
Re: How to get/operate the InputFileName in pig 0.8.1
Jameson Li 2011-06-17, 09:46
Another question:

The class *org.apache.pig.piggybank.storage.MultiStorage *can help me to store
the Pig output into
different directories.
But the I want to let the file not contain the 'splitFieldIndex'.
For example:
Input file:
id name
--------
1 jack
1 tom
1 lily
2 cat
2 pig
2 bird

After using MultiStorage('/my/home/output','0', 'bz2', '\\t') , normally, I
will get the below files and their contents:
1/1-0
------
1 jack
1 tom
1 lily

2/2-0
------
2 cat
2 pig
2 bird

I want to get the files and their contents:
1/1-0
------
jack
tom
lily

2/2-0
------
cat
pig
bird

Is there a switch that I can use to generate the store file that do or do
not contains the  'splitFieldIndex'?

I have seen the code it seems that the answer is No.
Maybe I have to write another class like
MultiStorageSwithWriteKey to extends the class MultiStorageSwithKey.
Am I right?

Thanks very much.
2011/6/17 Jameson Li <[EMAIL PROTECTED]>

> I am sorry that I have a fault.
> My newest jar file is in the dir /home/user/project/lib/myUDF.jar, but
> there has an old jar file in the pig lib dir $PIG-HOME/lib(/opt/pig/lib ).
> Unfortunately after registering the jar
> file--/home/user/project/lib/myUDF.jar, when the pig code execuded, it will
> first scan the UDF classes in the pig lib jar files.
>
> 2011/6/17 Daniel Dai <[EMAIL PROTECTED]>
>
>> Should not be. Pig does not cache myUDF.jar. Every run will pick myUDF.jar
>> again from /home/user/project/lib.
>>
>
>