Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> [UDF load function] My UDF load function has a strange issue, please help.


Copy link to this message
-
[UDF load function] My UDF load function has a strange issue, please help.
Hi,
 
I write an UDF load function "public class CslTextLoader extends FileInputLoadFunc implements LoadPushDown, LoadMetadata { ......}" refering to PigStorage.
Compared to PigStorage, CslTextLoader lack of store feature, and change the input format like this:
    @Override
    public InputFormat getInputFormat() {
         return new XRecordInputFormat();
    }
    @Override
    public void prepareToRead(RecordReader reader, PigSplit split) {
        in = (XLineRecordReader)reader;
        if (tagFile || tagPath) {
            sourcePath = ((FileSplit)split.getWrappedSplit()).getPath();
        }
    }
And I have attached the XRecordInputFormat (extends FileInputFormat<LongWritable, Text>, this is not the same as PigTextInputFormat)and XLineRecordReader to this email.
 
I write a pig script like this:
A = load 'pigTest/pigTest.txt' using CslTextLoader('=') as (f0,f1,f2,f3,f4:{(g1)},f5,f6:{(t1)});
C = filter A by f1 == 'LogRouteUpdate';
illustrate C;
D = foreach C generate f0 as LogLen, f1 as LogType, f5 as Strenth, f6 as OtherStren:{(t1:int)};
illustrate D;
dump D;
 
The issue is that:
D can be illustrated correctly, like this:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| C     | f0:bytearray    | f1:bytearray    | f2:bytearray    | f3:bytearray    | f4:bag{:tuple(t1:bytearray)}          | f5:bytearray    | f6:bag{:tuple(t1:bytearray)}          |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|       | 119             | LogRouteUpdate  | 10.88.46.100    | null            | {}                                    | 8               | {(13), (8)}                           |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
| D     | LogLen:bytearray    | LogType:bytearray    | Strenth:bytearray    | OtherStren:bag{:tuple(t1:int)}          |
-----------------------------------------------------------------------------------------------------------------------
|       | 119                 | LogRouteUpdate       | 8                    | {(13), (8)}                             |
-----------------------------------------------------------------------------------------------------------------------
 
But dump D error, it can NOT locate the f5 and f6 correctly, dump result like this:
(119,LogRouteUpdate,10.88.46.100,)
And "store D using PigStorage" is the same error.
 
It seem that the AS schema did not be matched to generate, or dump, or store feature. Where should I pay attention to when re-writing InputFormat and RecordReader about the schama matching to generate, dump feature? Can you give me some suggesion? Thank you!
 
BR.
Squall Luo