Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Questions for the future work of Hive


Copy link to this message
-
Questions for the future work of Hive
Schubert Zhang 2009-08-05, 07:06
In the Hive paper <Hive - A Warehousing Solution Over a MapReduce
Framework>, the section 5 describes the FUTURE WORK of Hive. I want to get
more detail of following tow points:

(1) Hive currently has a naive rule-based optimizer with a small number of
simple rules. We plan to build a cost-based optimizer and adaptive
optimization techniques to come up with more efficient plans.
Q: Is the ongoing work of "Indexing" the one of this improvement?
Q: Is there any more?
(2) We are exploring columnar storage and more intelligent data placement to
improve scan performance.
Q: We found that current Hive cannot place the data in different partitions
intelligently (we must specify the partition value in statements). Is the
intelligent/dynamic placement of partitions is one of this improvement? For
example, we have many input files which contain many records for diffenent
timestamp, and we want place each record into a proper partition according
to the timestamp colum.
Q: Do you think Bigtable/HBase is a good columnar storage which provides
good model of intelligent data placement?

Schubert