Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> AvroStorage load and store, schema with maps


+
Johannes Schwenk 2012-08-23, 15:49
+
Cheolsoo Park 2012-08-23, 18:11
Copy link to this message
-
Re: AvroStorage load and store, schema with maps
Actually, I found it in Pig manual:

 If you need to use different constructor parameters for different calls to
> the function you will need to create multiple defines – one for each
> parameter set.
For example, this works:

DEFINE AvroStorageNoParam
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> DEFINE AvroStorageWithParam
> org.apache.pig.piggybank.storage.avro.AvroStorage('schema', '{"type" :
> "map","values" : "string"}');
> loaded_data = LOAD 'map.avro' USING *AvroStorageNoParam*;
> describe loaded_data;
> STORE loaded_data INTO 'output' USING *AvroStorageWithParam*;
Please see the usage section:
http://pig.apache.org/docs/r0.10.0/basic.html#define-udfs

Thanks,
Cheolsoo

On Thu, Aug 23, 2012 at 11:11 AM, Cheolsoo Park <[EMAIL PROTECTED]>wrote:

> Hi Johannes,
>
> I was able to reproduce your error with the following Avro schema:
>
> {
>>   "type" : "map",
>>   "values" : "string"
>> }
>
>
> The issue is not in AvroStorage but in the DEFINE statement.
>
> DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
>
>
> AvroStorage has two constructors: one with no parameter and the other with
> parameters. To define output Avro schema, the second one must be used. But
> your DEFINE statement makes the first constructor be used always, resulting
> that output Avro schema is not set. If you remove the DEFINE statement and
> use the fully qualified name of AvroStorage, everything works. For example,
>
> loaded_data = LOAD 'map.avro' USING *
>> org.apache.pig.piggybank.storage.avro.AvroStorage.AvroStorage*();
>> describe loaded_data;
>> STORE loaded_data INTO 'output' USING *
>> org.apache.pig.piggybank.storage.avro.AvroStorage*('schema', '
>> {
>>   "type" : "map",
>>   "values" : "string"
>> }
>> ');
>
>
> Now the question is why DEFINE does not work here.
>
> Thanks,
> Cheolsoo
>
>
> On Thu, Aug 23, 2012 at 8:49 AM, Johannes Schwenk <
> [EMAIL PROTECTED]> wrote:
>
>> Hi all,
>>
>> I'm trying to execute the following pig script with pig-0.10.0 and yarn
>> (cdh4.0.0):
>>
>> --
>> DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
>> loaded_data = LOAD '$input' USING AvroStorage();
>> STORE loaded_data INTO '$output' USING AvroStorage('same', '$input');
>> --
>>
>> I call the pig this way:
>>
>> pig
>>
>> -Dpig.additional.jars=lib/piggybank.jar:lib/json-simple-1.1.jar:lib/avro-1.5.3.jar
>> -file script.pig -param input=input.avro -param output=output.avro
>>
>> The input.avro has the following schema:
>>
>> http://pastebin.com/ZWU6qLWx
>>
>> I always get
>>
>> <file script.pig, line 3, column 0> Output Location Validation Failed
>> for: 'xxx/output.avro' More info to follow:
>> Please provide schema for Map field!
>> Details at logfile: xxx/pig_1345735999390.log
>>
>> Log excerpt:
>>
>> Please provide schema for Map field!
>>         at
>>
>> org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:75)
>>         at
>> org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77)
>>         at
>>
>> org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
>>         at
>>
>> org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
>>         at
>> org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
>>         at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>>         at
>>
>> org.apache.pig.newplan.logical.rules.InputOutputFileValidator.validate(InputOutputFileValidator.java:45)
>>         at
>>
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:293)
>>         at org.apache.pig.PigServer.compilePp(PigServer.java:1316)
>>         at
>> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1253)
>>         at org.apache.pig.PigServer.execute(PigServer.java:1245)
>>         at org.apache.pig.PigServer.executeBatch(PigServer.java:362)
>>         at
>> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132)
+
Johannes Schwenk 2012-09-03, 11:16
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB