Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> LOAD multiple files with glob


+
Bart Verwilst 2012-11-23, 20:45
+
Deepak Tiwari 2012-11-23, 23:41
+
Bart Verwilst 2012-11-24, 13:15
+
Russell Jurney 2012-11-24, 19:23
+
Bart Verwilst 2012-11-25, 11:02
+
Cheolsoo Park 2012-11-25, 14:33
Copy link to this message
-
Re: LOAD multiple files with glob
Just tried this:
----------------------------------------------------
REGISTER 'hdfs:///lib/avro-1.7.2.jar';
REGISTER 'hdfs:///lib/json-simple-1.1.1.jar';
REGISTER 'hdfs:///lib/piggybank.jar';

DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();

avro = load '/data/2012/trace_ejb3/2012-01-0*.avro' USING
AvroStorage();

groups = group avro by tracetype;

dump groups;
----------------------------------------------------

gave me:

<file avro-test.pig, line 10, column 23> Invalid field projection.
Projected field [tracetype] does not exist.

Pig Stack Trace
---------------
ERROR 1025:
<file avro-test.pig, line 10, column 23> Invalid field projection.
Projected field [tracetype] does not exist.

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable
to open iterator for alias groups
at org.apache.pig.PigServer.openIterator(PigServer.java:862)
at
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:682)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:555)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store
alias groups
at org.apache.pig.PigServer.storeEx(PigServer.java:961)
at org.apache.pig.PigServer.store(PigServer.java:924)
at org.apache.pig.PigServer.openIterator(PigServer.java:837)
... 12 more
Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR
1025:
<file avro-test.pig, line 10, column 23> Invalid field projection.
Projected field [tracetype] does not exist.
at
org.apache.pig.newplan.logical.expression.ProjectExpression.findColNum(ProjectExpression.java:183)
at
org.apache.pig.newplan.logical.expression.ProjectExpression.setColumnNumberFromAlias(ProjectExpression.java:166)
at
org.apache.pig.newplan.logical.visitor.ColumnAliasConversionVisitor$1.visit(ColumnAliasConversionVisitor.java:53)
at
org.apache.pig.newplan.logical.expression.ProjectExpression.accept(ProjectExpression.java:207)
at
org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
at
org.apache.pig.newplan.logical.optimizer.AllExpressionVisitor.visit(AllExpressionVisitor.java:101)
at
org.apache.pig.newplan.logical.relational.LOCogroup.accept(LOCogroup.java:235)
at
org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
at org.apache.pig.PigServer$Graph.compile(PigServer.java:1621)
at org.apache.pig.PigServer$Graph.compile(PigServer.java:1616)
at org.apache.pig.PigServer$Graph.access$200(PigServer.java:1339)
at org.apache.pig.PigServer.storeEx(PigServer.java:956)
... 14 more
===============================================================================

Maybe globbing with [] doesnt work, but wildcard works? No idea why i
get the error above though..

Kind regards,

Bart

Cheolsoo Park schreef op 25.11.2012 15:33:
> Hi Bart,
>
> avro = load '/data/2012/trace_ejb3/2012-**01-*.avro' USING
> AvroStorage();
> gives me:
> Schema for avro unknown.
>
> This should work. The error that you're getting is not from
> AvroStorage but
> PigServer.
>
> grep -r "Schema for .* unknown" *
> src/org/apache/pig/PigServer.java:
>  System.out.println("Schema for " + alias + " unknown.");
> ...
>
> It looks like that you have an error in your Pig script. Can you
+
Cheolsoo Park 2012-11-26, 09:45
+
Bart Verwilst 2012-11-26, 13:19
+
Bart Verwilst 2012-11-26, 14:33
+
Bart Verwilst 2012-11-26, 15:50
+
Bart Verwilst 2012-11-26, 12:48
+
Bart Verwilst 2012-11-25, 20:14
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB