After read the paper, PAX is really good for Drill storage.
one of the benefit is that it just scan query columns, ignore others.
actually in Dremel, it doesn't scan full table, ignored lots of columns
it's not used in one query.
On Sat, Sep 15, 2012 at 4:05 AM, David Gruzman <[EMAIL PROTECTED]>wrote:
> Hi All,
> I would like to discuss the question of what will be native format for
> drill. Original Google dremel paper defined their hierarchical columnar
> data format. Since then
> google shifted from hierarchical data format... So it is a question if it
> makes sense to stick with it?
> If we are also moving to simple flat format we need our own format we have
> to support "native". In case of Drill I would define that native support as
> "high performance".
> I think we can go to some kind of PAX format with comprehensive metadata in
> the header, so each file is completely self contained and can be understood
> and processed without any external data.
> Alternative is to have single file per column. As far as I remember from
> our OpenDremel work the main decision point is - if we can read one column
> from the file without loading into node memory unnecessary data from other
> With best regards,