Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Does Avro GenericData.Record violate the .equals contract?


Copy link to this message
-
Re: Does Avro GenericData.Record violate the .equals contract?
This does look like a bug in GenericData.Record#equals().  It should
return false when the schemas are not equal.  It currently only checks
the schema names as a performance optimization, but that optimization is
not a good one.  Can you please file a bug report in Jira?

Thanks,

Doug

On 02/10/2012 04:26 AM, Andrew Kenworthy wrote:
> Hallo Doug,
>
> Thank you for your feedback. I agree that implicitly using Order.IGNORE
> to ignore differences in records makes sense, as that is the criteria
> used to define distinction when sorting. But it looks as though only the
> schema name is checked when deciding whether to examine each field or
> not. This can, as the test below shows, result in a lack of symmetry
> when using equals if one is not careful (i.e. the example is a "bad" one
> as it's not a good idea to have two schemas with the same name and
> namespace yet with different contents, but shows how one might
> inadvertently make a wrong assumption about equality):-
>
> @Test
> public void test() {
> Schema schema1 = Schema.createRecord("test_record", null,
> "my.namespace", false);
> List<Field> fields1 = new ArrayList<Field>();
> fields1.add(new Field("attribute1", Schema.create(Schema.Type.STRING),
> null, null, Order.IGNORE));
> schema1.setFields(fields1);
> Schema schema2 = Schema.createRecord("test_record", null,
> "my.namespace", false);
> List<Field> fields2 = new ArrayList<Field>();
> fields2.add(new Field("attribute1", Schema.create(Schema.Type.STRING),
> null, null, Order.ASCENDING));
> schema2.setFields(fields2);
> GenericRecord record1 = new GenericData.Record(schema1);
> record1.put("attribute1", "1");
> GenericRecord record2 = new GenericData.Record(schema2);
> record2.put("attribute1", "2");
> System.out.println(record1.equals(record2)); // returns TRUE
> System.out.println(record2.equals(record1)); // returns FALSE
> }
>
> Andrew
>
>     ------------------------------------------------------------------------
>     *From:* Doug Cutting <[EMAIL PROTECTED]>
>     *To:* [EMAIL PROTECTED]
>     *Sent:* Thursday, February 9, 2012 8:49 PM
>     *Subject:* Re: Does Avro GenericData.Record violate the .equals
>     contract?
>
>     On 02/09/2012 07:02 AM, Andrew Kenworthy wrote:
>     > This means that if I have no sorting defined in my schema, that all
>     > records are treated as being equal to one another.
>
>     If you specify "order":"ignore" for all fields in a record, then, yes,
>     all instances of that record would be equal.  I cannot imagine a case
>     where this would be useful, but I also don't see how this would violate
>     the equals() contract.
>
>     The default for fields is to behave as if "order":"ascending" is
>     specified.  Records are equal if all of their fields that are not
>     specified as "order":"ignore" are equal.
>
>     Doug
>
>