Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro, mail # user - Can serialized Avro records be efficiently compared without deserializing?


+
Jonathan Coveney 2012-05-22, 20:22
+
Russell Jurney 2012-05-22, 21:43
Copy link to this message
-
Re: Can serialized Avro records be efficiently compared without deserializing?
Doug Cutting 2012-05-23, 17:28
On Tue, May 22, 2012 at 1:22 PM, Jonathan Coveney <[EMAIL PROTECTED]> wrote:
> Imagine I use Avro to serialize an object (without loss of generality let's
> say an array of longs). I'm curious if it is possible to compare those
> arrays without deserializing... ie look at the bytes in memory or on disk,
> and do the comparison based on those bytes (ie the raw comparison that
> Hadoop does in the shuffle sort).
>
> I poked around the documentation but wasn't sure where to look.

Yes, this is possible.

The Java method that does this is BinaryData#compare().

http://avro.apache.org/docs/current/api/java/org/apache/avro/io/BinaryData.html#compare(byte[],
int, byte[], int, org.apache.avro.Schema)

Doug
+
Jon Coveney 2012-05-24, 06:13