Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - LOAD multiple files with glob


+
Bart Verwilst 2012-11-23, 20:45
+
Deepak Tiwari 2012-11-23, 23:41
+
Bart Verwilst 2012-11-24, 13:15
+
Russell Jurney 2012-11-24, 19:23
+
Bart Verwilst 2012-11-25, 11:02
+
Cheolsoo Park 2012-11-25, 14:33
+
Bart Verwilst 2012-11-25, 20:25
Copy link to this message
-
Re: LOAD multiple files with glob
Cheolsoo Park 2012-11-26, 09:45
Hi,

>> Invalid field projection. Projected field [tracetype] does not exist.

The error indicates that the "tracetype" doesn't exist in the Pig schema of
the relation "avro". What AvroStorage does is to automatically convert Avro
schema to Pig schema during the load. Although you have "tracetype" in your
Avro schema, "tracetype" doesn't exist in the generated Pig schema for
whatever reason.

Can you please try to "describe avro"? You can replace group and dump
commands with describe in your Pig script. This will show you what the Pig
schema of "avro" is. If "tracetype" indeed doesn't exist, you have to find
out why it doesn't. It could be because the schema of .avro files is not
the same or because there is a bug in AvroStorage, etc.

>> Maybe globbing with [] doesnt work, but wildcard works?

You're right. AvroStorage internally uses Hadoop path globing, and Hadoop
path globing doesn't support '[ ]'. But the above error (Projected field
[tracetype] does not exist) is not because of this. URISyntaxException is
what you will get because of '[ ]'.

Thanks,
Cheolsoo

On Sun, Nov 25, 2012 at 10:25 AM, Bart Verwilst <[EMAIL PROTECTED]> wrote:

> Just tried this:
>
>
> ------------------------------**----------------------
> REGISTER 'hdfs:///lib/avro-1.7.2.jar';
> REGISTER 'hdfs:///lib/json-simple-1.1.**1.jar';
> REGISTER 'hdfs:///lib/piggybank.jar';
>
> DEFINE AvroStorage org.apache.pig.piggybank.**storage.avro.AvroStorage();
>
> avro = load '/data/2012/trace_ejb3/2012-**01-0*.avro' USING AvroStorage();
>
> groups = group avro by tracetype;
>
> dump groups;
> ------------------------------**----------------------
>
> gave me:
>
> <file avro-test.pig, line 10, column 23> Invalid field projection.
> Projected field [tracetype] does not exist.
>
> Pig Stack Trace
> ---------------
> ERROR 1025:
> <file avro-test.pig, line 10, column 23> Invalid field projection.
> Projected field [tracetype] does not exist.
>
> org.apache.pig.impl.**logicalLayer.**FrontendException: ERROR 1066:
> Unable to open iterator for alias groups
>         at org.apache.pig.PigServer.**openIterator(PigServer.java:**862)
>         at org.apache.pig.tools.grunt.**GruntParser.processDump(**
> GruntParser.java:682)
>         at org.apache.pig.tools.**pigscript.parser.**
> PigScriptParser.parse(**PigScriptParser.java:303)
>         at org.apache.pig.tools.grunt.**GruntParser.parseStopOnError(**
> GruntParser.java:189)
>         at org.apache.pig.tools.grunt.**GruntParser.parseStopOnError(**
> GruntParser.java:165)
>         at org.apache.pig.tools.grunt.**Grunt.exec(Grunt.java:84)
>         at org.apache.pig.Main.run(Main.**java:555)
>         at org.apache.pig.Main.main(Main.**java:111)
>         at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native Method)
>         at sun.reflect.**NativeMethodAccessorImpl.**invoke(**
> NativeMethodAccessorImpl.java:**39)
>         at sun.reflect.**DelegatingMethodAccessorImpl.**invoke(**
> DelegatingMethodAccessorImpl.**java:25)
>         at java.lang.reflect.Method.**invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.**main(RunJar.java:208)
> Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias
> groups
>         at org.apache.pig.PigServer.**storeEx(PigServer.java:961)
>         at org.apache.pig.PigServer.**store(PigServer.java:924)
>         at org.apache.pig.PigServer.**openIterator(PigServer.java:**837)
>         ... 12 more
> Caused by: org.apache.pig.impl.plan.**PlanValidationException: ERROR 1025:
> <file avro-test.pig, line 10, column 23> Invalid field projection.
> Projected field [tracetype] does not exist.
>         at org.apache.pig.newplan.**logical.expression.**
> ProjectExpression.findColNum(**ProjectExpression.java:183)
>         at org.apache.pig.newplan.**logical.expression.**
> ProjectExpression.**setColumnNumberFromAlias(**ProjectExpression.java:166)
>         at org.apache.pig.newplan.**logical.visitor.**
> ColumnAliasConversionVisitor$**1.visit(**ColumnAliasConversionVisitor.**
+
Bart Verwilst 2012-11-26, 13:19
+
Bart Verwilst 2012-11-26, 14:33
+
Bart Verwilst 2012-11-26, 15:50
+
Bart Verwilst 2012-11-26, 12:48
+
Bart Verwilst 2012-11-25, 20:14