-Re: Record sort order is "lexicographically by field" -- what does that mean?
Jeremy Kahn 2013-03-28, 17:57
Thanks for the information, Harsh. Further comments inline below:
On Thu, Mar 28, 2013 at 4:01 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> On Thu, Mar 28, 2013 at 5:15 AM, Jeremy Kahn <[EMAIL PROTECTED]> wrote:
> > I can read "ordered lexicographically by field" in two ways:
> > 1. the names of the fields are sorted lexicographically, and the field
> > goes lexicographically first (not marked as "order":"ignore") dominates.
> > 2. the records are sorted by the sort order of each field, with the first
> > fields (not marked "order": "ignore") taking sort priority.
> The second one is correct. The field's order in the defined schema is
> not changed but only walked through.
> [...] that's true from my use of it in Hadoop MR as well.
Okay, this is very helpful to know: it's working the way I had hoped.
> > Behavior (2) -- relative to behavior (1) -- offers the ability to adjust
> > order of the schema to express a different sort order, but might present
> > problems for schema negotiation.
> What kind of problems are you describing here? Sorry if I'm not
> getting it by the words "schema negotiation" alone.
Suppose I sort a sequence of ZooInventory objects by the sort order implied
by this schema, and I send them to you in sorted order over a protocol with
an IDL type specification of array<ZooInventory>. You *read* the sequence
with a different ZooInventory schema with the same fields, but which
contains a different ordering. The objects in the array will not
(necessarily) appear to be sorted *to you*.
This isn't necessarily a problem -- it might actually be a feature. It is
worth noting that two schemas may be compatible under schema negotiation
but have different sort order for reader and writer.