Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> How to get/operate the InputFileName in pig 0.8.1


Copy link to this message
-
Re: How to get/operate the InputFileName in pig 0.8.1
Another question:

The class *org.apache.pig.piggybank.storage.MultiStorage *can help me to store
the Pig output into
different directories.
But the I want to let the file not contain the 'splitFieldIndex'.
For example:
Input file:
id name
--------
1 jack
1 tom
1 lily
2 cat
2 pig
2 bird

After using MultiStorage('/my/home/output','0', 'bz2', '\\t') , normally, I
will get the below files and their contents:
1/1-0
------
1 jack
1 tom
1 lily

2/2-0
------
2 cat
2 pig
2 bird

I want to get the files and their contents:
1/1-0
------
jack
tom
lily

2/2-0
------
cat
pig
bird

Is there a switch that I can use to generate the store file that do or do
not contains the  'splitFieldIndex'?

I have seen the code it seems that the answer is No.
Maybe I have to write another class like
MultiStorageSwithWriteKey to extends the class MultiStorageSwithKey.
Am I right?

Thanks very much.
2011/6/17 Jameson Li <[EMAIL PROTECTED]>

> I am sorry that I have a fault.
> My newest jar file is in the dir /home/user/project/lib/myUDF.jar, but
> there has an old jar file in the pig lib dir $PIG-HOME/lib(/opt/pig/lib ).
> Unfortunately after registering the jar
> file--/home/user/project/lib/myUDF.jar, when the pig code execuded, it will
> first scan the UDF classes in the pig lib jar files.
>
> 2011/6/17 Daniel Dai <[EMAIL PROTECTED]>
>
>> Should not be. Pig does not cache myUDF.jar. Every run will pick myUDF.jar
>> again from /home/user/project/lib.
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB