|
|
-
Re: LOAD multiple files with globCheolsoo Park 2012-11-26, 18:59
Yes, it is. Joe has unit test cases for path globbing in his patch:
https://reviews.apache.org/r/8104/diff/#index_header On Mon, Nov 26, 2012 at 8:23 AM, Russell Jurney <[EMAIL PROTECTED]>wrote: > Is the globbing feature making it into the AvroStorage rewrite? > > Russell Jurney twitter.com/rjurney > > > On Nov 26, 2012, at 7:50 AM, Bart Verwilst <[EMAIL PROTECTED]> wrote: > > > To answer myself again, I compiled Pig 0.11 and Piggybank, and it's > working very well now, globbing seems to be fully supported! > > > > Bart Verwilst schreef op 26.11.2012 15:33: > >> To answer myself, could this be part of the solution? : > >> > >> https://issues.apache.org/jira/browse/PIG-2492 > >> > >> Guess I'll have to wait for 0.11 then? > >> > >> Bart Verwilst schreef op 26.11.2012 14:19: > >>> 14:16:08 centos6-hadoop-hishiru ~ $ cat avro-test.pig > >>> REGISTER 'hdfs:///lib/avro-1.7.2.jar'; > >>> REGISTER 'hdfs:///lib/json-simple-1.1.1.jar'; > >>> REGISTER 'hdfs:///lib/piggybank.jar'; > >>> > >>> DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage(); > >>> avro = load '/test/*' USING AvroStorage(); > >>> describe avro; > >>> > >>> 14:16:09 centos6-hadoop-hishiru ~ $ pig avro-test.pig > >>> Schema for avro unknown. > >>> > >>> 14:16:17 centos6-hadoop-hishiru ~ $ vim avro-test.pig > >>> > >>> 14:16:25 centos6-hadoop-hishiru ~ $ cat avro-test.pig > >>> REGISTER 'hdfs:///lib/avro-1.7.2.jar'; > >>> REGISTER 'hdfs:///lib/json-simple-1.1.1.jar'; > >>> REGISTER 'hdfs:///lib/piggybank.jar'; > >>> > >>> DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage(); > >>> avro = load '/test/2012-11-25.avro' USING AvroStorage(); > >>> describe avro; > >>> > >>> 14:16:30 centos6-hadoop-hishiru ~ $ pig avro-test.pig > >>> avro: {id: long,timestamp: long,latitude: int,longitude: int,speed: > >>> int,heading: int,terminalid: int,customerid: chararray,mileage: > >>> int,creationtime: long,tracetype: int,traceproperties: {ARRAY_ELEM: > >>> (id: long,value: chararray,pkey: chararray)}} > >>> > >>> 14:16:55 centos6-hadoop-hishiru ~ $ hadoop fs -ls /test/ > >>> Found 1 items > >>> -rw-r--r-- 3 hdfs supergroup 63140500 2012-11-26 14:13 > /test/2012-11-25.avro > >>> > >>> Cheolsoo Park schreef op 26.11.2012 10:45: > >>>> Hi, > >>>> > >>>>>> Invalid field projection. Projected field [tracetype] does not > exist. > >>>> > >>>> The error indicates that the "tracetype" doesn't exist in the Pig > schema of > >>>> the relation "avro". What AvroStorage does is to automatically > convert Avro > >>>> schema to Pig schema during the load. Although you have "tracetype" > in your > >>>> Avro schema, "tracetype" doesn't exist in the generated Pig schema for > >>>> whatever reason. > >>>> > >>>> Can you please try to "describe avro"? You can replace group and dump > >>>> commands with describe in your Pig script. This will show you what > the Pig > >>>> schema of "avro" is. If "tracetype" indeed doesn't exist, you have to > find > >>>> out why it doesn't. It could be because the schema of .avro files is > not > >>>> the same or because there is a bug in AvroStorage, etc. > >>>> > >>>>>> Maybe globbing with [] doesnt work, but wildcard works? > >>>> > >>>> You're right. AvroStorage internally uses Hadoop path globing, and > Hadoop > >>>> path globing doesn't support '[ ]'. But the above error (Projected > field > >>>> [tracetype] does not exist) is not because of this. > URISyntaxException is > >>>> what you will get because of '[ ]'. > >>>> > >>>> Thanks, > >>>> Cheolsoo > >>>> > >>>> > >>>> > >>>> On Sun, Nov 25, 2012 at 10:25 AM, Bart Verwilst <[EMAIL PROTECTED]> > wrote: > >>>> > >>>>> Just tried this: > >>>>> > >>>>> > >>>>> ------------------------------**---------------------- > >>>>> REGISTER 'hdfs:///lib/avro-1.7.2.jar'; > >>>>> REGISTER 'hdfs:///lib/json-simple-1.1.**1.jar'; > >>>>> REGISTER 'hdfs:///lib/piggybank.jar'; > >>>>> > >>>>> DEFINE AvroStorage > org.apache.pig.piggybank.**storage.avro.AvroStorage(); > >>>> |