Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - TPC-H queries on Hive 0.12


Copy link to this message
-
Re: TPC-H queries on Hive 0.12
Yin Huai 2013-11-22, 19:44
I remember that textfiles are used in those scripts. With 0.12, I think ORC
should be used. Also, I think those sub-queries should be merged into a
single query. With a single query, if a reduce join is converted to a map
join, this map join can be merged to its child job. But, if this join is
evaluated by an individual query, hive has to use a single map only job to
evaluate it because it does not know this map only job is used to generate
intermediate results. For query 17 and query 18, with a single query,
Correlation Optimizer should be able to optimize these two queries (set
hive.optimize.correlation=true).

Thanks,

Yin
On Fri, Nov 22, 2013 at 1:31 PM, Avrilia Floratou <
[EMAIL PROTECTED]> wrote:

> Hello,
>
> I'd like to run a few TPC-H queries on Hive 0.12. I've found the TPC-H
> scripts here:
>
> https://issues.apache.org/jira/browse/HIVE-600.
>
> but noticed that these scripts were generated a long time ago. Since Hive
> could not support full SQL-92 specification some queries were split into
> smaller sub-queries whose results have been materialized. Is there any
> change in HiveQL (in Hive 0.12) that would affect the way the TPC-H queries
> are written?
>
> Thanks,
> Avrilia
>