sammefford 2013-03-21, 18:26
Row numbers are not stored explicitly. They are the implicit in the
ordinal position of values in the file.
Values are not sorted but are in row order. The primary performance
advantage of a columnar file is that, when only a subset of columns
are required, only a subset of the data need be read.
On Thu, Mar 21, 2013 at 11:26 AM, sammefford <[EMAIL PROTECTED]> wrote:
> I read the Trevni Specificaiton:
> and I can't see where the row ids are stored for each value in each column.
> Am I missing something obvious? Is the spec incomplete on that point?
> Also, to confirm, my understanding is columnar formats are efficient because
> they store column values sorted and can thereby find specific values or
> ranges of values quickly. While the spec mentions the benefits of sorting,
> I don't see a requirement that column values be sorted. Can we depend that
> the blocks of column values are sorted?
> Sam Mefford
> Chief Architect-Big Data Solutions
> Avalon Consluting, LLC.
> View this message in context: http://apache-avro.679487.n3.nabble.com/Where-are-the-rows-in-Trevni-format-tp4026663.html
> Sent from the Avro - Users mailing list archive at Nabble.com.