Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> STORE USING AvroStorage - ignores Pig field names, only using their position


+
Ruslan Al-Fakikh 2013-11-17, 02:19
+
Russell Jurney 2013-11-17, 02:53
+
Ruslan Al-Fakikh 2013-11-17, 03:16
+
Russell Jurney 2013-11-17, 03:17
Copy link to this message
-
Re: STORE USING AvroStorage - ignores Pig field names, only using their position
Russel,

Actually this problem came from the situation when I had the same names in
pig relation schema and avro schema. And it turned out that AvroStorage
switches fields if the order is different.
So, my impression is that it should work this way:
1) names correspond - then AvroStorage uses them
2) names do not correspond - then AvroStorage fails to store or does some
schema resolution as shown here:
http://avro.apache.org/docs/1.7.5/spec.html#Schema+Resolution

Thanks
On Sun, Nov 17, 2013 at 7:17 AM, Russell Jurney <[EMAIL PROTECTED]>wrote:

> How can pig map from a to nonsence_name?
>
>
> On Saturday, November 16, 2013, Ruslan Al-Fakikh wrote:
>
>> Thanks, Russel!
>>
>> Do you mean that this is the expected behavior? Shouldn't AvroStorage map
>> the pig fields by their names (not their field order) matching them to the
>> names in the avro schema?
>>
>> Thanks,
>> Ruslan Al-Fakikh
>>
>>
>> On Sun, Nov 17, 2013 at 6:53 AM, Russell Jurney <[EMAIL PROTECTED]
>> > wrote:
>>
>>> Pig tuples have field order. Swap the order of the fields in your avro
>>> schema and try again.
>>>
>>> On Nov 16, 2013, at 6:19 PM, Ruslan Al-Fakikh <[EMAIL PROTECTED]>
>>> wrote:
>>>
>>>  Hey guys,
>>>
>>> When I store with AvroStorage, the names from Pig tuple fields are
>>> completely ignored. The field values are put to the result file only by
>>> their position.
>>> Here is a simplified test case:
>>>
>>> %declare WORKDIR `pwd`
>>> REGISTER ../../../../lib/external/avro-1.7.4.jar
>>> REGISTER ../../../../lib/external/json-simple-1.1.jar
>>> --this is build (manually with Maven) from the latest source:
>>> --
>>> http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/
>>> REGISTER ../piggybankBuiltFromSource.jar
>>> REGISTER ../../../../lib/external/jackson-core-asl-1.8.8.jar
>>> REGISTER ../../../../lib/external/jackson-mapper-asl-1.8.8.jar
>>>
>>> --$ cat input.txt
>>> --data_a data_b
>>> --data_a data_b
>>> inputs = LOAD 'input.txt' AS (a: chararray, b: chararray);
>>>
>>> DESCRIBE inputs;
>>> DUMP inputs;
>>>
>>> --output:
>>> --inputs: {a: chararray,b: chararray}
>>> --(data_a,data_b)
>>> --(data_a,data_b)
>>>
>>> STORE inputs INTO 'output'
>>>     USING org.apache.pig.piggybank.storage.avro.AvroStorage('{
>>> "schema":
>>> {
>>>   "type" : "record",
>>>   "name" : "my_schema",
>>>   "namespace" : "com.my_namespace",
>>>   "fields" : [
>>>   {
>>>     "name" : "b",
>>>     "type" : "string"
>>>   },
>>>   {
>>>     "name" : "nonsense_name",
>>>     "type" : "string"
>>>   }
>>>   ]
>>> }
>>> }');
>>>
>>> --output
>>> --$ java -jar ../../../../lib/external/avro-tools-1.7.4.jar tojson
>>> output/part*
>>> --{"b":"data_a","nonsense_name":"data_b"}
>>> --{"b":"data_a","nonsense_name":"data_b"}
>>>
>>> AvroStorage is build from the latest piggybank code.
>>> Using AvroStorage "debug": 5 parameter didn't help.
>>>
>>> $ pig -version
>>> Apache Pig version 0.11.0-cdh4.3.0 (rexported)
>>> compiled May 27 2013, 20:48:21
>>>
>>> Any help would be appreciated.
>>>
>>> Thanks,
>>> Ruslan Al-Fakikh
>>>
>>>
>>
>
> --
> Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.
> com
>
+
Ruslan Al-Fakikh 2013-11-17, 03:42
+
Russell Jurney 2013-11-17, 04:01