Gunther, is it the case that there is anything extra that needs to be done to ship Parquet code with Hive right now?  If I read the patch correctly the Parquet jars were added to the pom and thus will be shipped as part of Hive.  As long as it works out of the box when a user says “create table … stored as parquet” why do we care whether the parquet jar is owned by Hive or another project?

The concern about feature mismatch in Parquet versus Hive is valid, but I’m not sure what to do about it other than assure that there are good error messages.  Users will often want to use non-Hive based storage formats (Parquet, Avro, etc.).  This means we need a good way to detect at SQL compile time that the underlying storage doesn’t support the indicated data type and throw a good error.

Also, it’s important to be clear going forward about what Hive as a project is signing up for.  If tomorrow someone decides to add a new datatype or feature we need to be clear that we expect the contributor to make this work for Hive owned formats (text, RC, sequence, ORC) but not necessarily for external formats (Parquet, Avro).  


On Feb 17, 2014, at 7:03 PM, Gunther Hagleitner <[EMAIL PROTECTED]> wrote:

NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB