Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> STORE USING AvroStorage - ignores Pig field names, only using their position


Copy link to this message
-
Re: STORE USING AvroStorage - ignores Pig field names, only using their position
How can pig map from a to nonsence_name?

On Saturday, November 16, 2013, Ruslan Al-Fakikh wrote:

> Thanks, Russel!
>
> Do you mean that this is the expected behavior? Shouldn't AvroStorage map
> the pig fields by their names (not their field order) matching them to the
> names in the avro schema?
>
> Thanks,
> Ruslan Al-Fakikh
>
>
> On Sun, Nov 17, 2013 at 6:53 AM, Russell Jurney <[EMAIL PROTECTED]<javascript:_e({}, 'cvml', '[EMAIL PROTECTED]');>
> > wrote:
>
>> Pig tuples have field order. Swap the order of the fields in your avro
>> schema and try again.
>>
>> On Nov 16, 2013, at 6:19 PM, Ruslan Al-Fakikh <[EMAIL PROTECTED]<javascript:_e({}, 'cvml', '[EMAIL PROTECTED]');>>
>> wrote:
>>
>> Hey guys,
>>
>> When I store with AvroStorage, the names from Pig tuple fields are
>> completely ignored. The field values are put to the result file only by
>> their position.
>> Here is a simplified test case:
>>
>> %declare WORKDIR `pwd`
>> REGISTER ../../../../lib/external/avro-1.7.4.jar
>> REGISTER ../../../../lib/external/json-simple-1.1.jar
>> --this is build (manually with Maven) from the latest source:
>> --
>> http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/
>> REGISTER ../piggybankBuiltFromSource.jar
>> REGISTER ../../../../lib/external/jackson-core-asl-1.8.8.jar
>> REGISTER ../../../../lib/external/jackson-mapper-asl-1.8.8.jar
>>
>> --$ cat input.txt
>> --data_a data_b
>> --data_a data_b
>> inputs = LOAD 'input.txt' AS (a: chararray, b: chararray);
>>
>> DESCRIBE inputs;
>> DUMP inputs;
>>
>> --output:
>> --inputs: {a: chararray,b: chararray}
>> --(data_a,data_b)
>> --(data_a,data_b)
>>
>> STORE inputs INTO 'output'
>>     USING org.apache.pig.piggybank.storage.avro.AvroStorage('{
>> "schema":
>> {
>>   "type" : "record",
>>   "name" : "my_schema",
>>   "namespace" : "com.my_namespace",
>>   "fields" : [
>>   {
>>     "name" : "b",
>>     "type" : "string"
>>   },
>>   {
>>     "name" : "nonsense_name",
>>     "type" : "string"
>>   }
>>   ]
>> }
>> }');
>>
>> --output
>> --$ java -jar ../../../../lib/external/avro-tools-1.7.4.jar tojson
>> output/part*
>> --{"b":"data_a","nonsense_name":"data_b"}
>> --{"b":"data_a","nonsense_name":"data_b"}
>>
>> AvroStorage is build from the latest piggybank code.
>> Using AvroStorage "debug": 5 parameter didn't help.
>>
>> $ pig -version
>> Apache Pig version 0.11.0-cdh4.3.0 (rexported)
>> compiled May 27 2013, 20:48:21
>>
>> Any help would be appreciated.
>>
>> Thanks,
>> Ruslan Al-Fakikh
>>
>>
>

--
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com