Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - STORE USING AvroStorage - ignores Pig field names, only using their position


Copy link to this message
-
Re: STORE USING AvroStorage - ignores Pig field names, only using their position
Ruslan Al-Fakikh 2013-11-17, 03:42
including this last message to pig user list
On Sun, Nov 17, 2013 at 7:40 AM, Ruslan Al-Fakikh <[EMAIL PROTECTED]>wrote:

> Russel,
>
> Actually this problem came from the situation when I had the same names in
> pig relation schema and avro schema. And it turned out that AvroStorage
> switches fields if the order is different.
> So, my impression is that it should work this way:
> 1) names correspond - then AvroStorage uses them
> 2) names do not correspond - then AvroStorage fails to store or does some
> schema resolution as shown here:
> http://avro.apache.org/docs/1.7.5/spec.html#Schema+Resolution
>
> Thanks
>
>
> On Sun, Nov 17, 2013 at 7:17 AM, Russell Jurney <[EMAIL PROTECTED]>wrote:
>
>> How can pig map from a to nonsence_name?
>>
>>
>> On Saturday, November 16, 2013, Ruslan Al-Fakikh wrote:
>>
>>> Thanks, Russel!
>>>
>>> Do you mean that this is the expected behavior? Shouldn't AvroStorage
>>> map the pig fields by their names (not their field order) matching them to
>>> the names in the avro schema?
>>>
>>> Thanks,
>>> Ruslan Al-Fakikh
>>>
>>>
>>> On Sun, Nov 17, 2013 at 6:53 AM, Russell Jurney <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> Pig tuples have field order. Swap the order of the fields in your avro
>>>> schema and try again.
>>>>
>>>> On Nov 16, 2013, at 6:19 PM, Ruslan Al-Fakikh <[EMAIL PROTECTED]>
>>>> wrote:
>>>>
>>>>  Hey guys,
>>>>
>>>> When I store with AvroStorage, the names from Pig tuple fields are
>>>> completely ignored. The field values are put to the result file only by
>>>> their position.
>>>> Here is a simplified test case:
>>>>
>>>> %declare WORKDIR `pwd`
>>>> REGISTER ../../../../lib/external/avro-1.7.4.jar
>>>> REGISTER ../../../../lib/external/json-simple-1.1.jar
>>>> --this is build (manually with Maven) from the latest source:
>>>> --
>>>> http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/
>>>> REGISTER ../piggybankBuiltFromSource.jar
>>>> REGISTER ../../../../lib/external/jackson-core-asl-1.8.8.jar
>>>> REGISTER ../../../../lib/external/jackson-mapper-asl-1.8.8.jar
>>>>
>>>> --$ cat input.txt
>>>> --data_a data_b
>>>> --data_a data_b
>>>> inputs = LOAD 'input.txt' AS (a: chararray, b: chararray);
>>>>
>>>> DESCRIBE inputs;
>>>> DUMP inputs;
>>>>
>>>> --output:
>>>> --inputs: {a: chararray,b: chararray}
>>>> --(data_a,data_b)
>>>> --(data_a,data_b)
>>>>
>>>> STORE inputs INTO 'output'
>>>>     USING org.apache.pig.piggybank.storage.avro.AvroStorage('{
>>>> "schema":
>>>> {
>>>>   "type" : "record",
>>>>   "name" : "my_schema",
>>>>   "namespace" : "com.my_namespace",
>>>>   "fields" : [
>>>>   {
>>>>     "name" : "b",
>>>>     "type" : "string"
>>>>   },
>>>>   {
>>>>     "name" : "nonsense_name",
>>>>     "type" : "string"
>>>>   }
>>>>   ]
>>>> }
>>>> }');
>>>>
>>>> --output
>>>> --$ java -jar ../../../../lib/external/avro-tools-1.7.4.jar tojson
>>>> output/part*
>>>> --{"b":"data_a","nonsense_name":"data_b"}
>>>> --{"b":"data_a","nonsense_name":"data_b"}
>>>>
>>>> AvroStorage is build from the latest piggybank code.
>>>> Using AvroStorage "debug": 5 parameter didn't help.
>>>>
>>>> $ pig -version
>>>> Apache Pig version 0.11.0-cdh4.3.0 (rexported)
>>>> compiled May 27 2013, 20:48:21
>>>>
>>>> Any help would be appreciated.
>>>>
>>>> Thanks,
>>>> Ruslan Al-Fakikh
>>>>
>>>>
>>>
>>
>> --
>> Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.
>> com
>>
>
>