Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> LOAD multiple files with glob


+
Bart Verwilst 2012-11-23, 20:45
+
Deepak Tiwari 2012-11-23, 23:41
+
Bart Verwilst 2012-11-24, 13:15
+
Russell Jurney 2012-11-24, 19:23
+
Bart Verwilst 2012-11-25, 11:02
+
Cheolsoo Park 2012-11-25, 14:33
+
Bart Verwilst 2012-11-25, 20:25
+
Cheolsoo Park 2012-11-26, 09:45
Copy link to this message
-
Re: LOAD multiple files with glob
14:16:08  centos6-hadoop-hishiru  ~ $ cat avro-test.pig
REGISTER 'hdfs:///lib/avro-1.7.2.jar';
REGISTER 'hdfs:///lib/json-simple-1.1.1.jar';
REGISTER 'hdfs:///lib/piggybank.jar';

DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
avro = load '/test/*' USING AvroStorage();
describe avro;

14:16:09  centos6-hadoop-hishiru  ~ $ pig avro-test.pig
Schema for avro unknown.

14:16:17  centos6-hadoop-hishiru  ~ $ vim avro-test.pig

14:16:25  centos6-hadoop-hishiru  ~ $ cat avro-test.pig
REGISTER 'hdfs:///lib/avro-1.7.2.jar';
REGISTER 'hdfs:///lib/json-simple-1.1.1.jar';
REGISTER 'hdfs:///lib/piggybank.jar';

DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
avro = load '/test/2012-11-25.avro' USING AvroStorage();
describe avro;

14:16:30  centos6-hadoop-hishiru  ~ $ pig avro-test.pig
avro: {id: long,timestamp: long,latitude: int,longitude: int,speed:
int,heading: int,terminalid: int,customerid: chararray,mileage:
int,creationtime: long,tracetype: int,traceproperties: {ARRAY_ELEM: (id:
long,value: chararray,pkey: chararray)}}

14:16:55  centos6-hadoop-hishiru  ~ $ hadoop fs -ls /test/
Found 1 items
-rw-r--r--   3 hdfs supergroup   63140500 2012-11-26 14:13
/test/2012-11-25.avro

Cheolsoo Park schreef op 26.11.2012 10:45:
> Hi,
>
>>> Invalid field projection. Projected field [tracetype] does not
>>> exist.
>
> The error indicates that the "tracetype" doesn't exist in the Pig
> schema of
> the relation "avro". What AvroStorage does is to automatically
> convert Avro
> schema to Pig schema during the load. Although you have "tracetype"
> in your
> Avro schema, "tracetype" doesn't exist in the generated Pig schema
> for
> whatever reason.
>
> Can you please try to "describe avro"? You can replace group and dump
> commands with describe in your Pig script. This will show you what
> the Pig
> schema of "avro" is. If "tracetype" indeed doesn't exist, you have to
> find
> out why it doesn't. It could be because the schema of .avro files is
> not
> the same or because there is a bug in AvroStorage, etc.
>
>>> Maybe globbing with [] doesnt work, but wildcard works?
>
> You're right. AvroStorage internally uses Hadoop path globing, and
> Hadoop
> path globing doesn't support '[ ]'. But the above error (Projected
> field
> [tracetype] does not exist) is not because of this.
> URISyntaxException is
> what you will get because of '[ ]'.
>
> Thanks,
> Cheolsoo
>
>
>
> On Sun, Nov 25, 2012 at 10:25 AM, Bart Verwilst <[EMAIL PROTECTED]>
> wrote:
>
>> Just tried this:
>>
>>
>> ------------------------------**----------------------
>> REGISTER 'hdfs:///lib/avro-1.7.2.jar';
>> REGISTER 'hdfs:///lib/json-simple-1.1.**1.jar';
>> REGISTER 'hdfs:///lib/piggybank.jar';
>>
>> DEFINE AvroStorage
>> org.apache.pig.piggybank.**storage.avro.AvroStorage();
>>
>> avro = load '/data/2012/trace_ejb3/2012-**01-0*.avro' USING
>> AvroStorage();
>>
>> groups = group avro by tracetype;
>>
>> dump groups;
>> ------------------------------**----------------------
>>
>> gave me:
>>
>> <file avro-test.pig, line 10, column 23> Invalid field projection.
>> Projected field [tracetype] does not exist.
>>
>> Pig Stack Trace
>> ---------------
>> ERROR 1025:
>> <file avro-test.pig, line 10, column 23> Invalid field projection.
>> Projected field [tracetype] does not exist.
>>
>> org.apache.pig.impl.**logicalLayer.**FrontendException: ERROR 1066:
>> Unable to open iterator for alias groups
>>         at
>> org.apache.pig.PigServer.**openIterator(PigServer.java:**862)
>>         at org.apache.pig.tools.grunt.**GruntParser.processDump(**
>> GruntParser.java:682)
>>         at org.apache.pig.tools.**pigscript.parser.**
>> PigScriptParser.parse(**PigScriptParser.java:303)
>>         at
>> org.apache.pig.tools.grunt.**GruntParser.parseStopOnError(**
>> GruntParser.java:189)
>>         at
>> org.apache.pig.tools.grunt.**GruntParser.parseStopOnError(**
>> GruntParser.java:165)
>>         at org.apache.pig.tools.grunt.**Grunt.exec(Grunt.java:84)
+
Bart Verwilst 2012-11-26, 14:33
+
Bart Verwilst 2012-11-26, 15:50
+
Bart Verwilst 2012-11-26, 12:48
+
Bart Verwilst 2012-11-25, 20:14
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB