Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Record sort order is "lexicographically by field" -- what does that mean?


Copy link to this message
-
Re: Record sort order is "lexicographically by field" -- what does that mean?
Thanks for the information, Harsh. Further comments inline below:

On Thu, Mar 28, 2013 at 4:01 AM, Harsh J <[EMAIL PROTECTED]> wrote:

> On Thu, Mar 28, 2013 at 5:15 AM, Jeremy Kahn <[EMAIL PROTECTED]> wrote:
> > I can read "ordered lexicographically by field" in two ways:
> >
> > 1. the names of the fields are sorted lexicographically, and the field
> that
> > goes lexicographically first (not marked as "order":"ignore") dominates.
> >
> > 2. the records are sorted by the sort order of each field, with the first
> > fields (not marked "order": "ignore") taking sort priority.
>
> The second one is correct. The field's order in the defined schema is
> not changed but only walked through.
>
> [...] that's true from my use of it in Hadoop MR as well.
>

Okay, this is very helpful to know: it's working the way I had hoped.

> > Behavior (2) -- relative to behavior (1) -- offers the ability to adjust
> the
> > order of the schema to express a different sort order, but might present
> > problems for schema negotiation.
>
> What kind of problems are you describing here? Sorry if I'm not
> getting it by the words "schema negotiation" alone.
>

Suppose I sort a sequence of ZooInventory objects by the sort order implied
by this schema, and I send them to you in sorted order over a protocol with
an IDL type specification of array<ZooInventory>.  You *read* the sequence
with a different ZooInventory schema with the same fields, but which
contains a different ordering. The objects in the array will not
(necessarily) appear to be sorted *to you*.

This isn't necessarily a problem -- it might actually be a feature. It is
worth noting that two schemas may be compatible under schema negotiation
but have different sort order for reader and writer.

--jeremy
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB