Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - STORE USING AvroStorage - ignores Pig field names, only using their position


Copy link to this message
-
STORE USING AvroStorage - ignores Pig field names, only using their position
Ruslan Al-Fakikh 2013-11-17, 02:19
Hey guys,

When I store with AvroStorage, the names from Pig tuple fields are
completely ignored. The field values are put to the result file only by
their position.
Here is a simplified test case:

%declare WORKDIR `pwd`
REGISTER ../../../../lib/external/avro-1.7.4.jar
REGISTER ../../../../lib/external/json-simple-1.1.jar
--this is build (manually with Maven) from the latest source:
--
http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/
REGISTER ../piggybankBuiltFromSource.jar
REGISTER ../../../../lib/external/jackson-core-asl-1.8.8.jar
REGISTER ../../../../lib/external/jackson-mapper-asl-1.8.8.jar

--$ cat input.txt
--data_a data_b
--data_a data_b
inputs = LOAD 'input.txt' AS (a: chararray, b: chararray);

DESCRIBE inputs;
DUMP inputs;

--output:
--inputs: {a: chararray,b: chararray}
--(data_a,data_b)
--(data_a,data_b)

STORE inputs INTO 'output'
    USING org.apache.pig.piggybank.storage.avro.AvroStorage('{
"schema":
{
  "type" : "record",
  "name" : "my_schema",
  "namespace" : "com.my_namespace",
  "fields" : [
  {
    "name" : "b",
    "type" : "string"
  },
  {
    "name" : "nonsense_name",
    "type" : "string"
  }
  ]
}
}');

--output
--$ java -jar ../../../../lib/external/avro-tools-1.7.4.jar tojson
output/part*
--{"b":"data_a","nonsense_name":"data_b"}
--{"b":"data_a","nonsense_name":"data_b"}

AvroStorage is build from the latest piggybank code.
Using AvroStorage "debug": 5 parameter didn't help.

$ pig -version
Apache Pig version 0.11.0-cdh4.3.0 (rexported)
compiled May 27 2013, 20:48:21

Any help would be appreciated.

Thanks,
Ruslan Al-Fakikh