Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Record sort order is "lexicographically by field" -- what does that mean?


+
Jeremy Kahn 2013-03-27, 23:45
+
Harsh J 2013-03-28, 11:01
Copy link to this message
-
Re: Record sort order is "lexicographically by field" -- what does that mean?
Thanks for the information, Harsh. Further comments inline below:

On Thu, Mar 28, 2013 at 4:01 AM, Harsh J <[EMAIL PROTECTED]> wrote:

> On Thu, Mar 28, 2013 at 5:15 AM, Jeremy Kahn <[EMAIL PROTECTED]> wrote:
> > I can read "ordered lexicographically by field" in two ways:
> >
> > 1. the names of the fields are sorted lexicographically, and the field
> that
> > goes lexicographically first (not marked as "order":"ignore") dominates.
> >
> > 2. the records are sorted by the sort order of each field, with the first
> > fields (not marked "order": "ignore") taking sort priority.
>
> The second one is correct. The field's order in the defined schema is
> not changed but only walked through.
>
> [...] that's true from my use of it in Hadoop MR as well.
>

Okay, this is very helpful to know: it's working the way I had hoped.

> > Behavior (2) -- relative to behavior (1) -- offers the ability to adjust
> the
> > order of the schema to express a different sort order, but might present
> > problems for schema negotiation.
>
> What kind of problems are you describing here? Sorry if I'm not
> getting it by the words "schema negotiation" alone.
>

Suppose I sort a sequence of ZooInventory objects by the sort order implied
by this schema, and I send them to you in sorted order over a protocol with
an IDL type specification of array<ZooInventory>.  You *read* the sequence
with a different ZooInventory schema with the same fields, but which
contains a different ordering. The objects in the array will not
(necessarily) appear to be sorted *to you*.

This isn't necessarily a problem -- it might actually be a feature. It is
worth noting that two schemas may be compatible under schema negotiation
but have different sort order for reader and writer.

--jeremy
+
Harsh J 2013-03-28, 18:15