Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig Avrostorage Issue regarding Schema evaluation


Copy link to this message
-
Re: Pig Avrostorage Issue regarding Schema evaluation
Jackson is no longer needed, right? Or is it coming back in 0.11?

Russell Jurney http://datasyndrome.com

On Jan 9, 2013, at 10:26 AM, Cheolsoo Park <[EMAIL PROTECTED]> wrote:

> Hi Milind,
>
> Please try this:
>
> REGISTER build/ivy/lib/Pig/avro-1.7.1.jar
> REGISTER build/ivy/lib/Pig/json-simple-1.1.jar
> REGISTER build/ivy/lib/Pig/jackson-mapper-asl-1.8.8.jar
> REGISTER build/ivy/lib/Pig/jackson-core-asl-1.8.8.jar
> REGISTER contrib/piggybank/java/piggybank.jar
>
> employee = LOAD '/home/cheolsoo/workspace/avro/emplyees' USING
> org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas');
> DESCRIBE employee;
> DUMP employee;
>
> I have two Avro files in my input directory:
>
> $java -jar /home/cheolsoo/workspace/avro/avro-tools-1.7.1.jar tojson
> record_employee.avro
> {"name":"a","age":0,"dept":"b","office":"c","salary":0.0}
>
> $java -jar /home/cheolsoo/workspace/avro/avro-tools-1.7.1.jar tojson
> record_employee2.avro
> {"name":"a","age":0,"dept":"b","office":"c","salary":0}
>
> record_employee.avro contains a float, and record_employee2.avro contains
> an int.
>
> The output looks as follows:
>
> ...
> employee: {name: chararray,age: int,dept: chararray,office:
> chararray,salary: float}
> ...
> (a,0,b,c,0.0)
> (a,0,b,c,0)
>
> Thanks,
> Cheolsoo
>
>
>
>
> On Wed, Jan 9, 2013 at 6:49 AM, Milind Vaidya <[EMAIL PROTECTED]> wrote:
>
>> Environment:
>>
>> Pig version: 0.11
>> Hadoop 0.23.6.0.1301071353
>>
>>
>> Script:
>>
>>
>> REGISTER /homes/immilind/HadoopLocal/Jars/avro-1.7.1.jar
>> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-all-1.8.10.jar
>> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-core-asl-1.8.10.jar
>> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-jaxrs-1.8.10.jar
>> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-mapper-asl-1.8.10.jar
>> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-xc-1.8.10.jar
>> REGISTER /home/gs/pig/current/lib-hadoop23/piggybank.jar
>>
>> employee= load '/user/immilind/AvroData' using
>> org.apache.pig.piggybank.storage.avro.AvroStorage( );
>> dump employee;
>>
>>
>> Schemas :
>>
>> {
>> "type" : "record",
>> "name" : "employee",
>> "fields":[
>>    {"name" : "name", "type" : "string", "default" : "NU"},
>>    {"name" : "age", "type" : "int","default" : 0},
>>    {"name" : "dept", "type": "string","default" : "DU"},
>>    {"name" : "office", "type": "string","default" : "OU"},
>>    {"name" : "salary", "type": "float","default" : 0.0}
>> ]
>> }
>>
>> {
>> "type" : "record",
>> "name" : "employee",
>> "fields":[
>>    {"name" : "name", "type" : "string", "default" : "NU"},
>>    {"name" : "age", "type" : "int","default" : 0},
>>    {"name" : "dept", "type": "string","default" : "DU"},
>>    {"name" : "office", "type": "string","default" : "OU"},
>>    {"name" : "salary", "type": "int", "default" : 0}
>> ]
>> }
>>
>>
>> Both the schemas differ only in one field. As per the schema evolution/
>> merging rules, I am expecting to see "int" fields loaded as "float". But
>> instead, the job fails due to field mismatch.
>>
>> I am referring to :
>>
>> Similar thread named "Working with changing schemas (avro) in Pig"
>>
>> https://mail-archives.apache.org/mod_mbox/pig-user/201204.mbox/%[EMAIL PROTECTED]%3E
>>
>> JIRA:
>> https://issues.apache.org/jira/browse/PIG-2579
>> How to use "multiple_schema' option with "AvroStorage"  as suggested by
>> this JIRA ?
>>
>> Function mergeType indicating rules for primitive types
>>
>> https://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
>>
>>
>>
>> Can anybody suggest what is going wrong ?
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB