Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> [UDF load function] My UDF load function has a strange issue, please help.


Copy link to this message
-
[UDF load function] My UDF load function has a strange issue, please help.
Hi,
 
I write an UDF load function "public class CslTextLoader extends FileInputLoadFunc implements LoadPushDown, LoadMetadata { ......}" refering to PigStorage.
Compared to PigStorage, CslTextLoader lack of store feature, and change the input format like this:
    @Override
    public InputFormat getInputFormat() {
         return new XRecordInputFormat();
    }
    @Override
    public void prepareToRead(RecordReader reader, PigSplit split) {
        in = (XLineRecordReader)reader;
        if (tagFile || tagPath) {
            sourcePath = ((FileSplit)split.getWrappedSplit()).getPath();
        }
    }
And I have attached the XRecordInputFormat (extends FileInputFormat<LongWritable, Text>, this is not the same as PigTextInputFormat)and XLineRecordReader to this email.
 
I write a pig script like this:
A = load 'pigTest/pigTest.txt' using CslTextLoader('=') as (f0,f1,f2,f3,f4:{(g1)},f5,f6:{(t1)});
C = filter A by f1 == 'LogRouteUpdate';
illustrate C;
D = foreach C generate f0 as LogLen, f1 as LogType, f5 as Strenth, f6 as OtherStren:{(t1:int)};
illustrate D;
dump D;
 
The issue is that:
D can be illustrated correctly, like this:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| C     | f0:bytearray    | f1:bytearray    | f2:bytearray    | f3:bytearray    | f4:bag{:tuple(t1:bytearray)}          | f5:bytearray    | f6:bag{:tuple(t1:bytearray)}          |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|       | 119             | LogRouteUpdate  | 10.88.46.100    | null            | {}                                    | 8               | {(13), (8)}                           |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
| D     | LogLen:bytearray    | LogType:bytearray    | Strenth:bytearray    | OtherStren:bag{:tuple(t1:int)}          |
-----------------------------------------------------------------------------------------------------------------------
|       | 119                 | LogRouteUpdate       | 8                    | {(13), (8)}                             |
-----------------------------------------------------------------------------------------------------------------------
 
But dump D error, it can NOT locate the f5 and f6 correctly, dump result like this:
(119,LogRouteUpdate,10.88.46.100,)
And "store D using PigStorage" is the same error.
 
It seem that the AS schema did not be matched to generate, or dump, or store feature. Where should I pay attention to when re-writing InputFormat and RecordReader about the schama matching to generate, dump feature? Can you give me some suggesion? Thank you!
 
BR.
Squall Luo
 
 
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB