Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> LOAD multiple files with glob


+
Bart Verwilst 2012-11-23, 20:45
+
Deepak Tiwari 2012-11-23, 23:41
+
Bart Verwilst 2012-11-24, 13:15
+
Russell Jurney 2012-11-24, 19:23
+
Bart Verwilst 2012-11-25, 11:02
+
Cheolsoo Park 2012-11-25, 14:33
+
Bart Verwilst 2012-11-25, 20:25
+
Cheolsoo Park 2012-11-26, 09:45
Copy link to this message
-
Re: LOAD multiple files with glob
14:16:08  centos6-hadoop-hishiru  ~ $ cat avro-test.pig
REGISTER 'hdfs:///lib/avro-1.7.2.jar';
REGISTER 'hdfs:///lib/json-simple-1.1.1.jar';
REGISTER 'hdfs:///lib/piggybank.jar';

DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
avro = load '/test/*' USING AvroStorage();
describe avro;

14:16:09  centos6-hadoop-hishiru  ~ $ pig avro-test.pig
Schema for avro unknown.

14:16:17  centos6-hadoop-hishiru  ~ $ vim avro-test.pig

14:16:25  centos6-hadoop-hishiru  ~ $ cat avro-test.pig
REGISTER 'hdfs:///lib/avro-1.7.2.jar';
REGISTER 'hdfs:///lib/json-simple-1.1.1.jar';
REGISTER 'hdfs:///lib/piggybank.jar';

DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
avro = load '/test/2012-11-25.avro' USING AvroStorage();
describe avro;

14:16:30  centos6-hadoop-hishiru  ~ $ pig avro-test.pig
avro: {id: long,timestamp: long,latitude: int,longitude: int,speed:
int,heading: int,terminalid: int,customerid: chararray,mileage:
int,creationtime: long,tracetype: int,traceproperties: {ARRAY_ELEM: (id:
long,value: chararray,pkey: chararray)}}

14:16:55  centos6-hadoop-hishiru  ~ $ hadoop fs -ls /test/
Found 1 items
-rw-r--r--   3 hdfs supergroup   63140500 2012-11-26 14:13
/test/2012-11-25.avro

Cheolsoo Park schreef op 26.11.2012 10:45:
> Hi,
>
>>> Invalid field projection. Projected field [tracetype] does not
>>> exist.
>
> The error indicates that the "tracetype" doesn't exist in the Pig
> schema of
> the relation "avro". What AvroStorage does is to automatically
> convert Avro
> schema to Pig schema during the load. Although you have "tracetype"
> in your
> Avro schema, "tracetype" doesn't exist in the generated Pig schema
> for
> whatever reason.
>
> Can you please try to "describe avro"? You can replace group and dump
> commands with describe in your Pig script. This will show you what
> the Pig
> schema of "avro" is. If "tracetype" indeed doesn't exist, you have to
> find
> out why it doesn't. It could be because the schema of .avro files is
> not
> the same or because there is a bug in AvroStorage, etc.
>
>>> Maybe globbing with [] doesnt work, but wildcard works?
>
> You're right. AvroStorage internally uses Hadoop path globing, and
> Hadoop
> path globing doesn't support '[ ]'. But the above error (Projected
> field
> [tracetype] does not exist) is not because of this.
> URISyntaxException is
> what you will get because of '[ ]'.
>
> Thanks,
> Cheolsoo
>
>
>
> On Sun, Nov 25, 2012 at 10:25 AM, Bart Verwilst <[EMAIL PROTECTED]>
> wrote:
>
>> Just tried this:
>>
>>
>> ------------------------------**----------------------
>> REGISTER 'hdfs:///lib/avro-1.7.2.jar';
>> REGISTER 'hdfs:///lib/json-simple-1.1.**1.jar';
>> REGISTER 'hdfs:///lib/piggybank.jar';
>>
>> DEFINE AvroStorage
>> org.apache.pig.piggybank.**storage.avro.AvroStorage();
>>
>> avro = load '/data/2012/trace_ejb3/2012-**01-0*.avro' USING
>> AvroStorage();
>>
>> groups = group avro by tracetype;
>>
>> dump groups;
>> ------------------------------**----------------------
>>
>> gave me:
>>
>> <file avro-test.pig, line 10, column 23> Invalid field projection.
>> Projected field [tracetype] does not exist.
>>
>> Pig Stack Trace
>> ---------------
>> ERROR 1025:
>> <file avro-test.pig, line 10, column 23> Invalid field projection.
>> Projected field [tracetype] does not exist.
>>
>> org.apache.pig.impl.**logicalLayer.**FrontendException: ERROR 1066:
>> Unable to open iterator for alias groups
>>         at
>> org.apache.pig.PigServer.**openIterator(PigServer.java:**862)
>>         at org.apache.pig.tools.grunt.**GruntParser.processDump(**
>> GruntParser.java:682)
>>         at org.apache.pig.tools.**pigscript.parser.**
>> PigScriptParser.parse(**PigScriptParser.java:303)
>>         at
>> org.apache.pig.tools.grunt.**GruntParser.parseStopOnError(**
>> GruntParser.java:189)
>>         at
>> org.apache.pig.tools.grunt.**GruntParser.parseStopOnError(**
>> GruntParser.java:165)
>>         at org.apache.pig.tools.grunt.**Grunt.exec(Grunt.java:84)
+
Bart Verwilst 2012-11-26, 14:33
+
Bart Verwilst 2012-11-26, 15:50
+
Bart Verwilst 2012-11-26, 12:48
+
Bart Verwilst 2012-11-25, 20:14