Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Questions for the future work of Hive


Copy link to this message
-
RE: Questions for the future work of Hive
Hive trunk has support for multi group by which performs better than what 0.3.0 does.

I did not completely understand your comment on "the two mappings should take place at the same time"..

Can you elaborate?

Ashish

-----Original Message-----
From: Andraz Tori [mailto:[EMAIL PROTECTED]]
Sent: Monday, August 10, 2009 1:11 AM
To: [EMAIL PROTECTED]
Subject: Re: Questions for the future work of Hive

> 2) We don't have a short-term plan for automatic-multi-partition
> insertion. However there is a simple workaround if you know the
> partition values (and Hive can do multiple inserts in a single
> map-reduce job!). "src" can be a sub query as well.
> FROM src
> INSERT OVERWRITE TABLE tgt PARTITION(pcol="2009-08-01") SELECT * WHERE
> ts = "2009-08-01"
> INSERT OVERWRITE TABLE tgt PARTITION(pcol="2009-08-02") SELECT * WHERE
> ts = "2009-08-02"

--------------------------------------------------------

In my case src too is partitioned by "ts", which means that two mappings should take place at the same time since the data is independant, but Hive (0.3) produces a linear partition-by-partition job sequence.
I also do group by inside every insert...
Any ideas?

[this, together with the fact that hive --service thriftserver (at least in 0.3) doesn't support multiple clients, makes it very hard to effectively run some queries.
--
Andraz Tori, CTO
Zemanta Ltd, New York, London, Ljubljana www.zemanta.com
mail: [EMAIL PROTECTED]
tel: +386 41 515 767
´╗┐twitter: andraz, skype: minmax_test

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB