Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - questions about sort-orders

Copy link to this message
Re: questions about sort-orders
Philip Zeyliger 2010-12-02, 18:06
> Lastly, it says "Note also that Avro binary-encoded data can be efficiently
> ordered without deserializing it to objects." What does this mean exactly?

This is hinting at an implementation detail, though one of historical interest.

There exists code, in Java, to compare two Avro objects based on their
byte[] representations.  This code happens to not create any objects;
rather, it deals with bytes directly, and thus it's "efficient".

This is of historical interest because of Avro's intended use in
Hadoop MapReduce.  Hadoop's sorting instantiates the key objects to do
the sorting, but there's a way to specify a "binary comparator" which
tells Hadoop to instantiate a class that just has a compare(byte[],
byte[]) method instead of a compare(Object, Object) method.  So, the
spec is suggesting that this is possible and that there's an
implementation of it.

You are right that plain ol' byte comparison does not sort Avro
objects correctly.  (This is kind of a bummer, in my opinion.  It
makes Avro objects not something that's useful for HBase keys.)

-- Philip