Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - STORE USING AvroStorage - ignores Pig field names, only using their position


Copy link to this message
-
Re: STORE USING AvroStorage - ignores Pig field names, only using their position
Russell Jurney 2013-11-17, 02:53
Pig tuples have field order. Swap the order of the fields in your avro schema and try again.

> On Nov 16, 2013, at 6:19 PM, Ruslan Al-Fakikh <[EMAIL PROTECTED]> wrote:
>
> Hey guys,
>
> When I store with AvroStorage, the names from Pig tuple fields are completely ignored. The field values are put to the result file only by their position.
> Here is a simplified test case:
>
> %declare WORKDIR `pwd`
> REGISTER ../../../../lib/external/avro-1.7.4.jar
> REGISTER ../../../../lib/external/json-simple-1.1.jar
> --this is build (manually with Maven) from the latest source:
> --http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/
> REGISTER ../piggybankBuiltFromSource.jar
> REGISTER ../../../../lib/external/jackson-core-asl-1.8.8.jar
> REGISTER ../../../../lib/external/jackson-mapper-asl-1.8.8.jar
>
> --$ cat input.txt
> --data_a data_b
> --data_a data_b
> inputs = LOAD 'input.txt' AS (a: chararray, b: chararray);
>
> DESCRIBE inputs;
> DUMP inputs;
>
> --output:
> --inputs: {a: chararray,b: chararray}
> --(data_a,data_b)
> --(data_a,data_b)
>
> STORE inputs INTO 'output'
>     USING org.apache.pig.piggybank.storage.avro.AvroStorage('{
> "schema":
> {
>   "type" : "record",
>   "name" : "my_schema",
>   "namespace" : "com.my_namespace",
>   "fields" : [
>   {
>     "name" : "b",
>     "type" : "string"
>   },
>   {
>     "name" : "nonsense_name",
>     "type" : "string"
>   }
>   ]
> }
> }');
>
> --output
> --$ java -jar ../../../../lib/external/avro-tools-1.7.4.jar tojson output/part*
> --{"b":"data_a","nonsense_name":"data_b"}
> --{"b":"data_a","nonsense_name":"data_b"}
>
> AvroStorage is build from the latest piggybank code.
> Using AvroStorage "debug": 5 parameter didn't help.
>
> $ pig -version
> Apache Pig version 0.11.0-cdh4.3.0 (rexported)
> compiled May 27 2013, 20:48:21
>
> Any help would be appreciated.
>
> Thanks,
> Ruslan Al-Fakikh