Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> LOAD multiple files with glob


+
Bart Verwilst 2012-11-23, 20:45
+
Deepak Tiwari 2012-11-23, 23:41
+
Bart Verwilst 2012-11-24, 13:15
+
Russell Jurney 2012-11-24, 19:23
+
Bart Verwilst 2012-11-25, 11:02
+
Cheolsoo Park 2012-11-25, 14:33
+
Bart Verwilst 2012-11-25, 20:25
+
Cheolsoo Park 2012-11-26, 09:45
+
Bart Verwilst 2012-11-26, 13:19
+
Bart Verwilst 2012-11-26, 14:33
+
Bart Verwilst 2012-11-26, 15:50
Copy link to this message
-
Re: LOAD multiple files with glob
Hi Cheolsoo,

Describe shows me:

avro: {id: long,timestamp: long,latitude: int,longitude: int,speed:
int,heading: int,terminalid: int,customerid: chararray,mileage:
int,creationtime: long,tracetype: int,traceproperties: {ARRAY_ELEM: (id:
long,value: chararray,pkey: chararray)}}

( tracetype is there.. )

So tracetype should work.. Also tried avro.tracetype, trace.tracetype,
but that didn't help..

Still, i've gotten us a bit sidetracked by this, since the issue was
that with wildcard globbing, i get "Schema for avro unknown." :)

Kind regards,

Bart

Cheolsoo Park schreef op 26.11.2012 10:45:
> Hi,
>
>>> Invalid field projection. Projected field [tracetype] does not
>>> exist.
>
> The error indicates that the "tracetype" doesn't exist in the Pig
> schema of
> the relation "avro". What AvroStorage does is to automatically
> convert Avro
> schema to Pig schema during the load. Although you have "tracetype"
> in your
> Avro schema, "tracetype" doesn't exist in the generated Pig schema
> for
> whatever reason.
>
> Can you please try to "describe avro"? You can replace group and dump
> commands with describe in your Pig script. This will show you what
> the Pig
> schema of "avro" is. If "tracetype" indeed doesn't exist, you have to
> find
> out why it doesn't. It could be because the schema of .avro files is
> not
> the same or because there is a bug in AvroStorage, etc.
>
>>> Maybe globbing with [] doesnt work, but wildcard works?
>
> You're right. AvroStorage internally uses Hadoop path globing, and
> Hadoop
> path globing doesn't support '[ ]'. But the above error (Projected
> field
> [tracetype] does not exist) is not because of this.
> URISyntaxException is
> what you will get because of '[ ]'.
>
> Thanks,
> Cheolsoo
>
>
>
> On Sun, Nov 25, 2012 at 10:25 AM, Bart Verwilst <[EMAIL PROTECTED]>
> wrote:
>
>> Just tried this:
>>
>>
>> ------------------------------**----------------------
>> REGISTER 'hdfs:///lib/avro-1.7.2.jar';
>> REGISTER 'hdfs:///lib/json-simple-1.1.**1.jar';
>> REGISTER 'hdfs:///lib/piggybank.jar';
>>
>> DEFINE AvroStorage
>> org.apache.pig.piggybank.**storage.avro.AvroStorage();
>>
>> avro = load '/data/2012/trace_ejb3/2012-**01-0*.avro' USING
>> AvroStorage();
>>
>> groups = group avro by tracetype;
>>
>> dump groups;
>> ------------------------------**----------------------
>>
>> gave me:
>>
>> <file avro-test.pig, line 10, column 23> Invalid field projection.
>> Projected field [tracetype] does not exist.
>>
>> Pig Stack Trace
>> ---------------
>> ERROR 1025:
>> <file avro-test.pig, line 10, column 23> Invalid field projection.
>> Projected field [tracetype] does not exist.
>>
>> org.apache.pig.impl.**logicalLayer.**FrontendException: ERROR 1066:
>> Unable to open iterator for alias groups
>>         at
>> org.apache.pig.PigServer.**openIterator(PigServer.java:**862)
>>         at org.apache.pig.tools.grunt.**GruntParser.processDump(**
>> GruntParser.java:682)
>>         at org.apache.pig.tools.**pigscript.parser.**
>> PigScriptParser.parse(**PigScriptParser.java:303)
>>         at
>> org.apache.pig.tools.grunt.**GruntParser.parseStopOnError(**
>> GruntParser.java:189)
>>         at
>> org.apache.pig.tools.grunt.**GruntParser.parseStopOnError(**
>> GruntParser.java:165)
>>         at org.apache.pig.tools.grunt.**Grunt.exec(Grunt.java:84)
>>         at org.apache.pig.Main.run(Main.**java:555)
>>         at org.apache.pig.Main.main(Main.**java:111)
>>         at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native
>> Method)
>>         at sun.reflect.**NativeMethodAccessorImpl.**invoke(**
>> NativeMethodAccessorImpl.java:**39)
>>         at sun.reflect.**DelegatingMethodAccessorImpl.**invoke(**
>> DelegatingMethodAccessorImpl.**java:25)
>>         at java.lang.reflect.Method.**invoke(Method.java:597)
>>         at org.apache.hadoop.util.RunJar.**main(RunJar.java:208)
>> Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store
>> alias
>> groups
>>         at org.apache.pig.PigServer.**storeEx(PigServer.java:961)
+
Bart Verwilst 2012-11-25, 20:14