Hive >> mail # dev >> HIVE optimizer enhancements in 0.9.0+ releases

Sukhendu Chakraborty 2012-11-15, 20:37
HIVE optimizer enhancements in 0.9.0+ releases

I am a HIVE user who is working on anlytical applications on large
data sets. For us, the HIVE performance is critical for the success of
our product. I was wondering if there are any recent improvements that
were made in the optimizer layer.  One of the relevant references I
found on the web is the HIVE paper
(http://infolab.stanford.edu/~ragho/hive-icde2010.pdf) . If you can
send me any pointers on current enhancements, that would be great.

Some specific improvements I am looking for are:
1. Cost based optimization (logical or physical)
2. "multi-query optimization techniques and performing generic n-way
joins in a single map-reduce job" (quoted from the future work section
of the paper above)
3. Using and generation of table statistics for generation of
betterplans/faster execution etc. I know there was some code added to
generate column statistics for HIVE tables. Any other statistics

Thanks for your help,