|
|
+
Uhlig, Hans 2012-01-10, 21:21
+
Doug Cutting 2012-01-10, 21:41
+
Vyacheslav Zholudev 2012-01-10, 21:45
-
Re: Union of Records IssueScott Carey 2012-01-10, 21:52
On 1/10/12 1:21 PM, "Uhlig, Hans" <[EMAIL PROTECTED]> wrote: >I am creating a dynamic union of records as seen below but keep receiving >an exception org.apache.avro.UnresolvedUnionException: Not in >union >Any reason why it deems the same schemas that created the >union invalid for collection? Avro throws this with each record it tries >to collect. An example of this working would be appreciated. > >Also, is there such a thing as a nullrecord, The records I am assembling >fit into a set instead of a Map but I could find no elegent way outside >of defining a record with a single field of null. > >inside ToolRunnner >Schema.Parser p = new Schema.Parser(); > >ArrayList<Schema> keySchemas = new ArrayList<Schema>(); >keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s1.avsc")) >); >keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s2.avsc")) >); >keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s3.avsc")) >); >keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s4.avsc")) >); >keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s5.avsc")) >); > >Schema keySchema = Schema.createUnion(keySchemas); >Schema valSchema >p.parse(AvroConverter.class.getResourceAsStream("null.avsc")); > >AvroJob.setMapOutputSchema(conf, Pair.getPairSchema(keySchema, >valSchema)); > >Inside Mapper Setup: >private static HashMap<String, Schema> keySchemas = new HashMap<String, >Schema>(); > >private static Schema valSchema; >Schema.Parser p = new Schema.Parser(); >keySchemas.put("s1", p.parse(Map.class.getResourceAsStream("s1.avsc"))); >keySchemas.put("s2", p.parse(Map.class.getResourceAsStream("s2.avsc"))); >keySchemas.put("s3", p.parse(Map.class.getResourceAsStream("s3.avsc"))); >keySchemas.put("s4", p.parse(Map.class.getResourceAsStream("s4.avsc"))); >keySchemas.put("s5", p.parse(Map.class.getResourceAsStream("s5.avsc"))); >valSchema = p.parse(Map.class.getResourceAsStream("null.avsc")); > >Inside Map function: >GenericData.Record r; >if(in.type=="s1") { >r = new GenericData.Record(keySchemas.get("s1"); >} else if(in.type=="s1") { >r = new GenericData.Record(keySchemas.get("s2"); >} >oc.collect(new AvroKey<GenericRecord>(r), new >AvroValue<GenericRecord>(new GenericData.Record(valSchema))); There is a bug in your mapper code, it is checking "s1" twice. Perhaps it is not "s1" and ends up passing null to new AvroKey() ? null will not be in the union. The code for creating an UnresolvedUnionException is: public UnresolvedUnionException(Schema unionSchema, Object unresolvedDatum) { super("Not in union "+unionSchema+": "+unresolvedDatum); this.unionSchema = unionSchema; this.unresolvedDatum = unresolvedDatum; } So I would expect a more informative error, including the full union schema and the toString() of the object in question passed in. I would check a few things: Print each schema you parsed with toJson() Print the union schema, and make sure it looks right. Validate that the GenericRecord you pass to the collector is never null. > > >Avro throws a Union Exception everytime I pass in a record. Any reason >why it deems the same schemas that created the >union invalid for collection? > >org.apache.avro.UnresolvedUnionException: Not in unionI am creating a >dynamic >union of records as seen below > >inside ToolRunnner >Schema.Parser p = new Schema.Parser(); > >ArrayList<Schema> keySchemas = new ArrayList<Schema>(); >keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s1.avsc")) >); >keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s2.avsc")) >); >keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s3.avsc")) >); >keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s4.avsc")) >); >keySchemas.add(p.parse(AvroConverter.class.getResourceAsStream("s5.avsc")) >); > >Schema keySchema = Schema.createUnion(keySchemas); >Schema valSchema >p.parse(AvroConverter.class.getResourceAsStream("null.avsc")); |