Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> AvroStorage load and store, schema with maps


+
Johannes Schwenk 2012-08-23, 15:49
+
Cheolsoo Park 2012-08-23, 18:11
+
Cheolsoo Park 2012-08-23, 19:02
Copy link to this message
-
Re: AvroStorage load and store, schema with maps
Thank you very much!

I was confused because it seems to be ok to pass parameters to DEFINEd
functions. If this does not work, it should be a syntax error trying to
pass them anyway. Maybe a parser exception could be thrown?

Thanks again!
Johannes
Am 23.08.2012 21:02, schrieb Cheolsoo Park:
> Actually, I found it in Pig manual:
>
>  If you need to use different constructor parameters for different calls to
>> the function you will need to create multiple defines – one for each
>> parameter set.
>
>
> For example, this works:
>
> DEFINE AvroStorageNoParam
>> org.apache.pig.piggybank.storage.avro.AvroStorage();
>> DEFINE AvroStorageWithParam
>> org.apache.pig.piggybank.storage.avro.AvroStorage('schema', '{"type" :
>> "map","values" : "string"}');
>> loaded_data = LOAD 'map.avro' USING *AvroStorageNoParam*;
>> describe loaded_data;
>> STORE loaded_data INTO 'output' USING *AvroStorageWithParam*;
>
>
> Please see the usage section:
> http://pig.apache.org/docs/r0.10.0/basic.html#define-udfs
>
> Thanks,
> Cheolsoo
>
> On Thu, Aug 23, 2012 at 11:11 AM, Cheolsoo Park <[EMAIL PROTECTED]>wrote:
>
>> Hi Johannes,
>>
>> I was able to reproduce your error with the following Avro schema:
>>
>> {
>>>   "type" : "map",
>>>   "values" : "string"
>>> }
>>
>>
>> The issue is not in AvroStorage but in the DEFINE statement.
>>
>> DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
>>
>>
>> AvroStorage has two constructors: one with no parameter and the other with
>> parameters. To define output Avro schema, the second one must be used. But
>> your DEFINE statement makes the first constructor be used always, resulting
>> that output Avro schema is not set. If you remove the DEFINE statement and
>> use the fully qualified name of AvroStorage, everything works. For example,
>>
>> loaded_data = LOAD 'map.avro' USING *
>>> org.apache.pig.piggybank.storage.avro.AvroStorage.AvroStorage*();
>>> describe loaded_data;
>>> STORE loaded_data INTO 'output' USING *
>>> org.apache.pig.piggybank.storage.avro.AvroStorage*('schema', '
>>> {
>>>   "type" : "map",
>>>   "values" : "string"
>>> }
>>> ');
>>
>>
>> Now the question is why DEFINE does not work here.
>>
>> Thanks,
>> Cheolsoo
>>
>>
>> On Thu, Aug 23, 2012 at 8:49 AM, Johannes Schwenk <
>> [EMAIL PROTECTED]> wrote:
>>
>>> Hi all,
>>>
>>> I'm trying to execute the following pig script with pig-0.10.0 and yarn
>>> (cdh4.0.0):
>>>
>>> --
>>> DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
>>> loaded_data = LOAD '$input' USING AvroStorage();
>>> STORE loaded_data INTO '$output' USING AvroStorage('same', '$input');
>>> --
>>>
>>> I call the pig this way:
>>>
>>> pig
>>>
>>> -Dpig.additional.jars=lib/piggybank.jar:lib/json-simple-1.1.jar:lib/avro-1.5.3.jar
>>> -file script.pig -param input=input.avro -param output=output.avro
>>>
>>> The input.avro has the following schema:
>>>
>>> http://pastebin.com/ZWU6qLWx
>>>
>>> I always get
>>>
>>> <file script.pig, line 3, column 0> Output Location Validation Failed
>>> for: 'xxx/output.avro' More info to follow:
>>> Please provide schema for Map field!
>>> Details at logfile: xxx/pig_1345735999390.log
>>>
>>> Log excerpt:
>>>
>>> Please provide schema for Map field!
>>>         at
>>>
>>> org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:75)
>>>         at
>>> org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77)
>>>         at
>>>
>>> org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
>>>         at
>>>
>>> org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
>>>         at
>>> org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
>>>         at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>>>         at
>>>
>>> org.apache.pig.newplan.logical.rules.InputOutputFileValidator.validate(InputOutputFileValidator.java:45)
>>>         at

Johannes Schwenk

Softwareentwickler (Reporting)
________________________________________________________

ADITION technologies AG
Schwarzwaldstraße 78b
79117 Freiburg

http://www.adition.com

T +49 / (0)761 / 88147 - 30
F +49 / (0)761 / 88147 - 77
SUPPORT +49  / (0)1805 - ADITION

(Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)

Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
UStIDNr.: DE 218 858 434