Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Union of Records Issue


+
Uhlig, Hans 2012-01-10, 21:21
+
Doug Cutting 2012-01-10, 21:41
+
Vyacheslav Zholudev 2012-01-10, 21:45
Copy link to this message
-
Re: Union of Records Issue

On 1/10/12 1:21 PM, "Uhlig, Hans" <[EMAIL PROTECTED]> wrote:

>I am creating a dynamic union of records as seen below but keep receiving
>an exception org.apache.avro.UnresolvedUnionException: Not in
>union
>Any reason why it deems the same schemas that created the
>union invalid for collection? Avro throws this with each record it tries
>to collect. An example of this working would be appreciated.
>
>Also, is there such a thing as a nullrecord, The records I am assembling
>fit into a set instead of a Map but I could find no elegent way outside
>of defining a record with a single field of null.
>
>inside ToolRunnner
>Schema.Parser p = new Schema.Parser();
>        
>ArrayList<Schema> keySchemas = new ArrayList<Schema>();
>keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s1.avsc"))
>);
>keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s2.avsc"))
>);
>keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s3.avsc"))
>);
>keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s4.avsc"))
>);
>keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s5.avsc"))
>);
>        
>Schema keySchema = Schema.createUnion(keySchemas);
>Schema valSchema >p.parse(AvroConverter.class.getResourceAsStream("null.avsc"));
>
>AvroJob.setMapOutputSchema(conf, Pair.getPairSchema(keySchema,
>valSchema));
>
>Inside Mapper Setup:
>private static HashMap<String, Schema> keySchemas = new HashMap<String,
>Schema>();
>
>private static Schema valSchema;
>Schema.Parser p = new Schema.Parser();
>keySchemas.put("s1", p.parse(Map.class.getResourceAsStream("s1.avsc")));
>keySchemas.put("s2", p.parse(Map.class.getResourceAsStream("s2.avsc")));
>keySchemas.put("s3", p.parse(Map.class.getResourceAsStream("s3.avsc")));
>keySchemas.put("s4", p.parse(Map.class.getResourceAsStream("s4.avsc")));
>keySchemas.put("s5", p.parse(Map.class.getResourceAsStream("s5.avsc")));
>valSchema = p.parse(Map.class.getResourceAsStream("null.avsc"));
>
>Inside Map function:
>GenericData.Record r;
>if(in.type=="s1") {
>r = new GenericData.Record(keySchemas.get("s1");
>} else if(in.type=="s1") {
>r = new GenericData.Record(keySchemas.get("s2");
>}
>oc.collect(new AvroKey<GenericRecord>(r), new
>AvroValue<GenericRecord>(new GenericData.Record(valSchema)));
There is a bug in your mapper code, it is checking "s1" twice.  Perhaps it
is not "s1" and ends up passing null to new AvroKey() ?    null will not
be in the union.

The code for creating an UnresolvedUnionException is:

  public UnresolvedUnionException(Schema unionSchema, Object
unresolvedDatum) {
    super("Not in union "+unionSchema+": "+unresolvedDatum);
    this.unionSchema = unionSchema;
    this.unresolvedDatum = unresolvedDatum;
  }

So I would expect a more informative error, including the full union
schema and the toString() of the object in question passed in.

I would check a few things:
Print each schema you parsed with toJson()
Print the union schema, and make sure it looks right.
Validate that the GenericRecord you pass to the collector is never null.

>
>
>Avro throws a Union Exception everytime I pass in a record. Any reason
>why it deems the same schemas that created the
>union invalid for collection?
>
>org.apache.avro.UnresolvedUnionException: Not in unionI am creating a
>dynamic
>union of records as seen below
>
>inside ToolRunnner
>Schema.Parser p = new Schema.Parser();
>        
>ArrayList<Schema> keySchemas = new ArrayList<Schema>();
>keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s1.avsc"))
>);
>keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s2.avsc"))
>);
>keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s3.avsc"))
>);
>keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s4.avsc"))
>);
>keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s5.avsc"))
>);
>        
>Schema keySchema = Schema.createUnion(keySchemas);
>Schema valSchema >p.parse(AvroConverter.class.getResourceAsStream("null.avsc"));
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB