Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro, mail # user - STORE USING AvroStorage - ignores Pig field names, only using their position


+
Ruslan Al-Fakikh 2013-11-17, 02:19
+
Russell Jurney 2013-11-17, 02:53
+
Ruslan Al-Fakikh 2013-11-17, 03:16
+
Russell Jurney 2013-11-17, 03:17
+
Ruslan Al-Fakikh 2013-11-17, 03:40
+
Ruslan Al-Fakikh 2013-11-17, 03:42
Copy link to this message
-
Re: STORE USING AvroStorage - ignores Pig field names, only using their position
Russell Jurney 2013-11-17, 04:01
I think the expected behavior of AvroStorage is to use the tuple-ordered
fields in the order they exist in the tuple. So to fix your problem, swap
the order of b/nonsense_name.

Otherwise I can't see a way to map from b to nonsense_name at all. Pig
can't know how to do that without referencing tuple field order.

On Sat, Nov 16, 2013 at 7:42 PM, Ruslan Al-Fakikh <[EMAIL PROTECTED]>wrote:

> including this last message to pig user list
>
>
> On Sun, Nov 17, 2013 at 7:40 AM, Ruslan Al-Fakikh <[EMAIL PROTECTED]>wrote:
>
>> Russel,
>>
>> Actually this problem came from the situation when I had the same names
>> in pig relation schema and avro schema. And it turned out that AvroStorage
>> switches fields if the order is different.
>> So, my impression is that it should work this way:
>> 1) names correspond - then AvroStorage uses them
>> 2) names do not correspond - then AvroStorage fails to store or does some
>> schema resolution as shown here:
>> http://avro.apache.org/docs/1.7.5/spec.html#Schema+Resolution
>>
>> Thanks
>>
>>
>> On Sun, Nov 17, 2013 at 7:17 AM, Russell Jurney <[EMAIL PROTECTED]
>> > wrote:
>>
>>> How can pig map from a to nonsence_name?
>>>
>>>
>>> On Saturday, November 16, 2013, Ruslan Al-Fakikh wrote:
>>>
>>>> Thanks, Russel!
>>>>
>>>> Do you mean that this is the expected behavior? Shouldn't AvroStorage
>>>> map the pig fields by their names (not their field order) matching them to
>>>> the names in the avro schema?
>>>>
>>>> Thanks,
>>>> Ruslan Al-Fakikh
>>>>
>>>>
>>>> On Sun, Nov 17, 2013 at 6:53 AM, Russell Jurney <
>>>> [EMAIL PROTECTED]> wrote:
>>>>
>>>>> Pig tuples have field order. Swap the order of the fields in your avro
>>>>> schema and try again.
>>>>>
>>>>> On Nov 16, 2013, at 6:19 PM, Ruslan Al-Fakikh <[EMAIL PROTECTED]>
>>>>> wrote:
>>>>>
>>>>>  Hey guys,
>>>>>
>>>>> When I store with AvroStorage, the names from Pig tuple fields are
>>>>> completely ignored. The field values are put to the result file only by
>>>>> their position.
>>>>> Here is a simplified test case:
>>>>>
>>>>> %declare WORKDIR `pwd`
>>>>> REGISTER ../../../../lib/external/avro-1.7.4.jar
>>>>> REGISTER ../../../../lib/external/json-simple-1.1.jar
>>>>> --this is build (manually with Maven) from the latest source:
>>>>> --
>>>>> http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/
>>>>> REGISTER ../piggybankBuiltFromSource.jar
>>>>> REGISTER ../../../../lib/external/jackson-core-asl-1.8.8.jar
>>>>> REGISTER ../../../../lib/external/jackson-mapper-asl-1.8.8.jar
>>>>>
>>>>> --$ cat input.txt
>>>>> --data_a data_b
>>>>> --data_a data_b
>>>>> inputs = LOAD 'input.txt' AS (a: chararray, b: chararray);
>>>>>
>>>>> DESCRIBE inputs;
>>>>> DUMP inputs;
>>>>>
>>>>> --output:
>>>>> --inputs: {a: chararray,b: chararray}
>>>>> --(data_a,data_b)
>>>>> --(data_a,data_b)
>>>>>
>>>>> STORE inputs INTO 'output'
>>>>>     USING org.apache.pig.piggybank.storage.avro.AvroStorage('{
>>>>> "schema":
>>>>> {
>>>>>   "type" : "record",
>>>>>   "name" : "my_schema",
>>>>>   "namespace" : "com.my_namespace",
>>>>>   "fields" : [
>>>>>   {
>>>>>     "name" : "b",
>>>>>     "type" : "string"
>>>>>   },
>>>>>   {
>>>>>     "name" : "nonsense_name",
>>>>>     "type" : "string"
>>>>>   }
>>>>>   ]
>>>>> }
>>>>> }');
>>>>>
>>>>> --output
>>>>> --$ java -jar ../../../../lib/external/avro-tools-1.7.4.jar tojson
>>>>> output/part*
>>>>> --{"b":"data_a","nonsense_name":"data_b"}
>>>>> --{"b":"data_a","nonsense_name":"data_b"}
>>>>>
>>>>> AvroStorage is build from the latest piggybank code.
>>>>> Using AvroStorage "debug": 5 parameter didn't help.
>>>>>
>>>>> $ pig -version
>>>>> Apache Pig version 0.11.0-cdh4.3.0 (rexported)
>>>>> compiled May 27 2013, 20:48:21
>>>>>
>>>>> Any help would be appreciated.
>>>>>
>>>>> Thanks,
>>>>> Ruslan Al-Fakikh
>>>>>
>>>>>
>>>>
>>>
>>> --
>>> Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome
>>> .com
>>>
>>
>>
>
--
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com