Hive, mail # user - Questions for the future work of Hive

Questions for the future work of Hive
Schubert Zhang 2009-08-05, 07:06
In the Hive paper <Hive - A Warehousing Solution Over a MapReduce
Framework>, the section 5 describes the FUTURE WORK of Hive. I want to get
more detail of following tow points:

(1) Hive currently has a naive rule-based optimizer with a small number of
simple rules. We plan to build a cost-based optimizer and adaptive
optimization techniques to come up with more efficient plans.
Q: Is the ongoing work of "Indexing" the one of this improvement?
Q: Is there any more?
(2) We are exploring columnar storage and more intelligent data placement to
improve scan performance.
Q: We found that current Hive cannot place the data in different partitions
intelligently (we must specify the partition value in statements). Is the
intelligent/dynamic placement of partitions is one of this improvement? For
example, we have many input files which contain many records for diffenent
timestamp, and we want place each record into a proper partition according
to the timestamp colum.
Q: Do you think Bigtable/HBase is a good columnar storage which provides
good model of intelligent data placement?

Zheng Shao 2009-08-05, 07:26
Schubert Zhang 2009-08-05, 18:42
Ashish Thusoo 2009-08-05, 19:08
Schubert Zhang 2009-08-06, 04:01
Andraz Tori 2009-08-10, 08:11
Ashish Thusoo 2009-08-10, 19:34