-Re: Record sort order is "lexicographically by field" -- what does that mean?
Harsh J 2013-03-28, 18:15
Hmmm, I've not used the messaging aspects of Avro much as of yet, but
AFAIK the sorting is only applied manually by use of the
BinaryData.compare(…) API methods. If the IPC parts use that for some
reason to compare two messages or more, then I can imagine this to be
a problem as well.
On Thu, Mar 28, 2013 at 11:27 PM, Jeremy Kahn <[EMAIL PROTECTED]> wrote:
> Thanks for the information, Harsh. Further comments inline below:
> On Thu, Mar 28, 2013 at 4:01 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>> On Thu, Mar 28, 2013 at 5:15 AM, Jeremy Kahn <[EMAIL PROTECTED]> wrote:
>> > I can read "ordered lexicographically by field" in two ways:
>> > 1. the names of the fields are sorted lexicographically, and the field
>> > that
>> > goes lexicographically first (not marked as "order":"ignore") dominates.
>> > 2. the records are sorted by the sort order of each field, with the
>> > first
>> > fields (not marked "order": "ignore") taking sort priority.
>> The second one is correct. The field's order in the defined schema is
>> not changed but only walked through.
>> [...] that's true from my use of it in Hadoop MR as well.
> Okay, this is very helpful to know: it's working the way I had hoped.
>> > Behavior (2) -- relative to behavior (1) -- offers the ability to adjust
>> > the
>> > order of the schema to express a different sort order, but might present
>> > problems for schema negotiation.
>> What kind of problems are you describing here? Sorry if I'm not
>> getting it by the words "schema negotiation" alone.
> Suppose I sort a sequence of ZooInventory objects by the sort order implied
> by this schema, and I send them to you in sorted order over a protocol with
> an IDL type specification of array<ZooInventory>. You *read* the sequence
> with a different ZooInventory schema with the same fields, but which
> contains a different ordering. The objects in the array will not
> (necessarily) appear to be sorted *to you*.
> This isn't necessarily a problem -- it might actually be a feature. It is
> worth noting that two schemas may be compatible under schema negotiation but
> have different sort order for reader and writer.