Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Union of Records Issue


Copy link to this message
-
Re: Union of Records Issue

On 1/10/12 1:21 PM, "Uhlig, Hans" <[EMAIL PROTECTED]> wrote:

>I am creating a dynamic union of records as seen below but keep receiving
>an exception org.apache.avro.UnresolvedUnionException: Not in
>union
>Any reason why it deems the same schemas that created the
>union invalid for collection? Avro throws this with each record it tries
>to collect. An example of this working would be appreciated.
>
>Also, is there such a thing as a nullrecord, The records I am assembling
>fit into a set instead of a Map but I could find no elegent way outside
>of defining a record with a single field of null.
>
>inside ToolRunnner
>Schema.Parser p = new Schema.Parser();
>        
>ArrayList<Schema> keySchemas = new ArrayList<Schema>();
>keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s1.avsc"))
>);
>keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s2.avsc"))
>);
>keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s3.avsc"))
>);
>keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s4.avsc"))
>);
>keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s5.avsc"))
>);
>        
>Schema keySchema = Schema.createUnion(keySchemas);
>Schema valSchema >p.parse(AvroConverter.class.getResourceAsStream("null.avsc"));
>
>AvroJob.setMapOutputSchema(conf, Pair.getPairSchema(keySchema,
>valSchema));
>
>Inside Mapper Setup:
>private static HashMap<String, Schema> keySchemas = new HashMap<String,
>Schema>();
>
>private static Schema valSchema;
>Schema.Parser p = new Schema.Parser();
>keySchemas.put("s1", p.parse(Map.class.getResourceAsStream("s1.avsc")));
>keySchemas.put("s2", p.parse(Map.class.getResourceAsStream("s2.avsc")));
>keySchemas.put("s3", p.parse(Map.class.getResourceAsStream("s3.avsc")));
>keySchemas.put("s4", p.parse(Map.class.getResourceAsStream("s4.avsc")));
>keySchemas.put("s5", p.parse(Map.class.getResourceAsStream("s5.avsc")));
>valSchema = p.parse(Map.class.getResourceAsStream("null.avsc"));
>
>Inside Map function:
>GenericData.Record r;
>if(in.type=="s1") {
>r = new GenericData.Record(keySchemas.get("s1");
>} else if(in.type=="s1") {
>r = new GenericData.Record(keySchemas.get("s2");
>}
>oc.collect(new AvroKey<GenericRecord>(r), new
>AvroValue<GenericRecord>(new GenericData.Record(valSchema)));
There is a bug in your mapper code, it is checking "s1" twice.  Perhaps it
is not "s1" and ends up passing null to new AvroKey() ?    null will not
be in the union.

The code for creating an UnresolvedUnionException is:

  public UnresolvedUnionException(Schema unionSchema, Object
unresolvedDatum) {
    super("Not in union "+unionSchema+": "+unresolvedDatum);
    this.unionSchema = unionSchema;
    this.unresolvedDatum = unresolvedDatum;
  }

So I would expect a more informative error, including the full union
schema and the toString() of the object in question passed in.

I would check a few things:
Print each schema you parsed with toJson()
Print the union schema, and make sure it looks right.
Validate that the GenericRecord you pass to the collector is never null.

>
>
>Avro throws a Union Exception everytime I pass in a record. Any reason
>why it deems the same schemas that created the
>union invalid for collection?
>
>org.apache.avro.UnresolvedUnionException: Not in unionI am creating a
>dynamic
>union of records as seen below
>
>inside ToolRunnner
>Schema.Parser p = new Schema.Parser();
>        
>ArrayList<Schema> keySchemas = new ArrayList<Schema>();
>keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s1.avsc"))
>);
>keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s2.avsc"))
>);
>keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s3.avsc"))
>);
>keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s4.avsc"))
>);
>keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s5.avsc"))
>);
>        
>Schema keySchema = Schema.createUnion(keySchemas);
>Schema valSchema >p.parse(AvroConverter.class.getResourceAsStream("null.avsc"));