Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> STORE USING AvroStorage - ignores Pig field names, only using their position


Copy link to this message
-
STORE USING AvroStorage - ignores Pig field names, only using their position
Hey guys,

When I store with AvroStorage, the names from Pig tuple fields are
completely ignored. The field values are put to the result file only by
their position.
Here is a simplified test case:

%declare WORKDIR `pwd`
REGISTER ../../../../lib/external/avro-1.7.4.jar
REGISTER ../../../../lib/external/json-simple-1.1.jar
--this is build (manually with Maven) from the latest source:
--
http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/
REGISTER ../piggybankBuiltFromSource.jar
REGISTER ../../../../lib/external/jackson-core-asl-1.8.8.jar
REGISTER ../../../../lib/external/jackson-mapper-asl-1.8.8.jar

--$ cat input.txt
--data_a data_b
--data_a data_b
inputs = LOAD 'input.txt' AS (a: chararray, b: chararray);

DESCRIBE inputs;
DUMP inputs;

--output:
--inputs: {a: chararray,b: chararray}
--(data_a,data_b)
--(data_a,data_b)

STORE inputs INTO 'output'
    USING org.apache.pig.piggybank.storage.avro.AvroStorage('{
"schema":
{
  "type" : "record",
  "name" : "my_schema",
  "namespace" : "com.my_namespace",
  "fields" : [
  {
    "name" : "b",
    "type" : "string"
  },
  {
    "name" : "nonsense_name",
    "type" : "string"
  }
  ]
}
}');

--output
--$ java -jar ../../../../lib/external/avro-tools-1.7.4.jar tojson
output/part*
--{"b":"data_a","nonsense_name":"data_b"}
--{"b":"data_a","nonsense_name":"data_b"}

AvroStorage is build from the latest piggybank code.
Using AvroStorage "debug": 5 parameter didn't help.

$ pig -version
Apache Pig version 0.11.0-cdh4.3.0 (rexported)
compiled May 27 2013, 20:48:21

Any help would be appreciated.

Thanks,
Ruslan Al-Fakikh
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB