Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Questions for the future work of Hive


Copy link to this message
-
Questions for the future work of Hive
In the Hive paper <Hive - A Warehousing Solution Over a MapReduce
Framework>, the section 5 describes the FUTURE WORK of Hive. I want to get
more detail of following tow points:

(1) Hive currently has a naive rule-based optimizer with a small number of
simple rules. We plan to build a cost-based optimizer and adaptive
optimization techniques to come up with more efficient plans.
Q: Is the ongoing work of "Indexing" the one of this improvement?
Q: Is there any more?
(2) We are exploring columnar storage and more intelligent data placement to
improve scan performance.
Q: We found that current Hive cannot place the data in different partitions
intelligently (we must specify the partition value in statements). Is the
intelligent/dynamic placement of partitions is one of this improvement? For
example, we have many input files which contain many records for diffenent
timestamp, and we want place each record into a proper partition according
to the timestamp colum.
Q: Do you think Bigtable/HBase is a good columnar storage which provides
good model of intelligent data placement?

Schubert
+
Zheng Shao 2009-08-05, 07:26
+
Schubert Zhang 2009-08-05, 18:42
+
Ashish Thusoo 2009-08-05, 19:08
+
Schubert Zhang 2009-08-06, 04:01
+
Andraz Tori 2009-08-10, 08:11
+
Ashish Thusoo 2009-08-10, 19:34
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB