Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - AvroStorage load and store, schema with maps


Copy link to this message
-
Re: AvroStorage load and store, schema with maps
Cheolsoo Park 2012-08-23, 19:02
Actually, I found it in Pig manual:

 If you need to use different constructor parameters for different calls to
> the function you will need to create multiple defines – one for each
> parameter set.
For example, this works:

DEFINE AvroStorageNoParam
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> DEFINE AvroStorageWithParam
> org.apache.pig.piggybank.storage.avro.AvroStorage('schema', '{"type" :
> "map","values" : "string"}');
> loaded_data = LOAD 'map.avro' USING *AvroStorageNoParam*;
> describe loaded_data;
> STORE loaded_data INTO 'output' USING *AvroStorageWithParam*;
Please see the usage section:
http://pig.apache.org/docs/r0.10.0/basic.html#define-udfs

Thanks,
Cheolsoo

On Thu, Aug 23, 2012 at 11:11 AM, Cheolsoo Park <[EMAIL PROTECTED]>wrote:

> Hi Johannes,
>
> I was able to reproduce your error with the following Avro schema:
>
> {
>>   "type" : "map",
>>   "values" : "string"
>> }
>
>
> The issue is not in AvroStorage but in the DEFINE statement.
>
> DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
>
>
> AvroStorage has two constructors: one with no parameter and the other with
> parameters. To define output Avro schema, the second one must be used. But
> your DEFINE statement makes the first constructor be used always, resulting
> that output Avro schema is not set. If you remove the DEFINE statement and
> use the fully qualified name of AvroStorage, everything works. For example,
>
> loaded_data = LOAD 'map.avro' USING *
>> org.apache.pig.piggybank.storage.avro.AvroStorage.AvroStorage*();
>> describe loaded_data;
>> STORE loaded_data INTO 'output' USING *
>> org.apache.pig.piggybank.storage.avro.AvroStorage*('schema', '
>> {
>>   "type" : "map",
>>   "values" : "string"
>> }
>> ');
>
>
> Now the question is why DEFINE does not work here.
>
> Thanks,
> Cheolsoo
>
>
> On Thu, Aug 23, 2012 at 8:49 AM, Johannes Schwenk <
> [EMAIL PROTECTED]> wrote:
>
>> Hi all,
>>
>> I'm trying to execute the following pig script with pig-0.10.0 and yarn
>> (cdh4.0.0):
>>
>> --
>> DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
>> loaded_data = LOAD '$input' USING AvroStorage();
>> STORE loaded_data INTO '$output' USING AvroStorage('same', '$input');
>> --
>>
>> I call the pig this way:
>>
>> pig
>>
>> -Dpig.additional.jars=lib/piggybank.jar:lib/json-simple-1.1.jar:lib/avro-1.5.3.jar
>> -file script.pig -param input=input.avro -param output=output.avro
>>
>> The input.avro has the following schema:
>>
>> http://pastebin.com/ZWU6qLWx
>>
>> I always get
>>
>> <file script.pig, line 3, column 0> Output Location Validation Failed
>> for: 'xxx/output.avro' More info to follow:
>> Please provide schema for Map field!
>> Details at logfile: xxx/pig_1345735999390.log
>>
>> Log excerpt:
>>
>> Please provide schema for Map field!
>>         at
>>
>> org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:75)
>>         at
>> org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77)
>>         at
>>
>> org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
>>         at
>>
>> org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
>>         at
>> org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
>>         at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>>         at
>>
>> org.apache.pig.newplan.logical.rules.InputOutputFileValidator.validate(InputOutputFileValidator.java:45)
>>         at
>>
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:293)
>>         at org.apache.pig.PigServer.compilePp(PigServer.java:1316)
>>         at
>> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1253)
>>         at org.apache.pig.PigServer.execute(PigServer.java:1245)
>>         at org.apache.pig.PigServer.executeBatch(PigServer.java:362)
>>         at
>> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132)