Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - ORC queries inefficient for sorted field


Copy link to this message
-
Re: ORC queries inefficient for sorted field
Prasanth Jayachandran 2014-02-24, 20:06
Hi Bryan

ORC indexes are used only for the selection of stripes and row groups and not for answering queries.

You can enable hive.compute.query.using.stats flag to answer queries using metadata. When this flag is enabled, hive metastore is checked to see if column statistics exists for the required columns. If column statistics exists, then certain queries like min, max, count etc. will be answered without ever scanning the table.

This is the JIRA that added the above feature (its available in hive version 0.13.0)
https://issues.apache.org/jira/browse/HIVE-5483

Thanks
Prasanth Jayachandran

On Feb 22, 2014, at 7:03 PM, Bryan Jeffrey <[EMAIL PROTECTED]> wrote:

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.