Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Does Avro GenericData.Record violate the .equals contract?


Copy link to this message
-
Re: Does Avro GenericData.Record violate the .equals contract?
This does look like a bug in GenericData.Record#equals().  It should
return false when the schemas are not equal.  It currently only checks
the schema names as a performance optimization, but that optimization is
not a good one.  Can you please file a bug report in Jira?

Thanks,

Doug

On 02/10/2012 04:26 AM, Andrew Kenworthy wrote:
> Hallo Doug,
>
> Thank you for your feedback. I agree that implicitly using Order.IGNORE
> to ignore differences in records makes sense, as that is the criteria
> used to define distinction when sorting. But it looks as though only the
> schema name is checked when deciding whether to examine each field or
> not. This can, as the test below shows, result in a lack of symmetry
> when using equals if one is not careful (i.e. the example is a "bad" one
> as it's not a good idea to have two schemas with the same name and
> namespace yet with different contents, but shows how one might
> inadvertently make a wrong assumption about equality):-
>
> @Test
> public void test() {
> Schema schema1 = Schema.createRecord("test_record", null,
> "my.namespace", false);
> List<Field> fields1 = new ArrayList<Field>();
> fields1.add(new Field("attribute1", Schema.create(Schema.Type.STRING),
> null, null, Order.IGNORE));
> schema1.setFields(fields1);
> Schema schema2 = Schema.createRecord("test_record", null,
> "my.namespace", false);
> List<Field> fields2 = new ArrayList<Field>();
> fields2.add(new Field("attribute1", Schema.create(Schema.Type.STRING),
> null, null, Order.ASCENDING));
> schema2.setFields(fields2);
> GenericRecord record1 = new GenericData.Record(schema1);
> record1.put("attribute1", "1");
> GenericRecord record2 = new GenericData.Record(schema2);
> record2.put("attribute1", "2");
> System.out.println(record1.equals(record2)); // returns TRUE
> System.out.println(record2.equals(record1)); // returns FALSE
> }
>
> Andrew
>
>     ------------------------------------------------------------------------
>     *From:* Doug Cutting <[EMAIL PROTECTED]>
>     *To:* [EMAIL PROTECTED]
>     *Sent:* Thursday, February 9, 2012 8:49 PM
>     *Subject:* Re: Does Avro GenericData.Record violate the .equals
>     contract?
>
>     On 02/09/2012 07:02 AM, Andrew Kenworthy wrote:
>     > This means that if I have no sorting defined in my schema, that all
>     > records are treated as being equal to one another.
>
>     If you specify "order":"ignore" for all fields in a record, then, yes,
>     all instances of that record would be equal.  I cannot imagine a case
>     where this would be useful, but I also don't see how this would violate
>     the equals() contract.
>
>     The default for fields is to behave as if "order":"ascending" is
>     specified.  Records are equal if all of their fields that are not
>     specified as "order":"ignore" are equal.
>
>     Doug
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB