Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - AvroStorage load and store, schema with maps


Copy link to this message
-
Re: AvroStorage load and store, schema with maps
Cheolsoo Park 2012-08-23, 18:11
Hi Johannes,

I was able to reproduce your error with the following Avro schema:

{
>   "type" : "map",
>   "values" : "string"
> }
The issue is not in AvroStorage but in the DEFINE statement.

DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
AvroStorage has two constructors: one with no parameter and the other with
parameters. To define output Avro schema, the second one must be used. But
your DEFINE statement makes the first constructor be used always, resulting
that output Avro schema is not set. If you remove the DEFINE statement and
use the fully qualified name of AvroStorage, everything works. For example,

loaded_data = LOAD 'map.avro' USING *
> org.apache.pig.piggybank.storage.avro.AvroStorage.AvroStorage*();
> describe loaded_data;
> STORE loaded_data INTO 'output' USING *
> org.apache.pig.piggybank.storage.avro.AvroStorage*('schema', '
> {
>   "type" : "map",
>   "values" : "string"
> }
> ');
Now the question is why DEFINE does not work here.

Thanks,
Cheolsoo
On Thu, Aug 23, 2012 at 8:49 AM, Johannes Schwenk <
[EMAIL PROTECTED]> wrote:

> Hi all,
>
> I'm trying to execute the following pig script with pig-0.10.0 and yarn
> (cdh4.0.0):
>
> --
> DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
> loaded_data = LOAD '$input' USING AvroStorage();
> STORE loaded_data INTO '$output' USING AvroStorage('same', '$input');
> --
>
> I call the pig this way:
>
> pig
>
> -Dpig.additional.jars=lib/piggybank.jar:lib/json-simple-1.1.jar:lib/avro-1.5.3.jar
> -file script.pig -param input=input.avro -param output=output.avro
>
> The input.avro has the following schema:
>
> http://pastebin.com/ZWU6qLWx
>
> I always get
>
> <file script.pig, line 3, column 0> Output Location Validation Failed
> for: 'xxx/output.avro' More info to follow:
> Please provide schema for Map field!
> Details at logfile: xxx/pig_1345735999390.log
>
> Log excerpt:
>
> Please provide schema for Map field!
>         at
>
> org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:75)
>         at
> org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77)
>         at
>
> org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
>         at
>
> org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
>         at
> org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
>         at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>         at
>
> org.apache.pig.newplan.logical.rules.InputOutputFileValidator.validate(InputOutputFileValidator.java:45)
>         at
>
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:293)
>         at org.apache.pig.PigServer.compilePp(PigServer.java:1316)
>         at
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1253)
>         at org.apache.pig.PigServer.execute(PigServer.java:1245)
>         at org.apache.pig.PigServer.executeBatch(PigServer.java:362)
>         at
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132)
>         at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
>         at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
>         at org.apache.pig.Main.run(Main.java:430)
>         at org.apache.pig.Main.main(Main.java:111)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> Caused by: java.io.IOException: Please provide schema for Map field!
>         at