Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # dev >> [jira] [Created] (AVRO-1452) Problem when using AvroMultipleOutputs with multiple schemas


Copy link to this message
-
[jira] [Created] (AVRO-1452) Problem when using AvroMultipleOutputs with multiple schemas
Vladislav Spivak created AVRO-1452:
--------------------------------------

             Summary: Problem when using AvroMultipleOutputs with multiple schemas
                 Key: AVRO-1452
                 URL: https://issues.apache.org/jira/browse/AVRO-1452
             Project: Avro
          Issue Type: Bug
    Affects Versions: 1.7.6
         Environment: Any Platform
            Reporter: Vladislav Spivak
When using multiple named outputs with different Key/Value Schemas, the last provided schema overrides any previous schema definitions after first write attempt. This happens due to issue with the following  code in AvroMultipleOutputs.java:509
/*begin*/
    Job job = new Job(context.getConfiguration());
   ...
    setSchema(job, keySchema, valSchema);
    taskContext = createTaskAttemptContext(
      job.getConfiguration(), context.getTaskAttemptID());
/*end*/
Every time this code runs, actual configuration instance passed to createTaskAttemptContext remains the same, because Job constructor creates new configuration copy only if it is not instanceof JobConf. This way we have properties  "avro.schema.output.XXX" overwrote each time new TaskAttemptContext is initialised and also mistakenly shared Configuration instance for all TaskAttemptContextes

Proposed fix:
a) use "Job getInstance(Configuration conf)" or
b) call "new Job(new Configuration(context.getConfiguration))"

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB