Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Questions for the future work of Hive


Copy link to this message
-
RE: Questions for the future work of Hive
Ashish Thusoo 2009-08-10, 19:34
Hive trunk has support for multi group by which performs better than what 0.3.0 does.

I did not completely understand your comment on "the two mappings should take place at the same time"..

Can you elaborate?

Ashish

-----Original Message-----
From: Andraz Tori [mailto:[EMAIL PROTECTED]]
Sent: Monday, August 10, 2009 1:11 AM
To: [EMAIL PROTECTED]
Subject: Re: Questions for the future work of Hive

> 2) We don't have a short-term plan for automatic-multi-partition
> insertion. However there is a simple workaround if you know the
> partition values (and Hive can do multiple inserts in a single
> map-reduce job!). "src" can be a sub query as well.
> FROM src
> INSERT OVERWRITE TABLE tgt PARTITION(pcol="2009-08-01") SELECT * WHERE
> ts = "2009-08-01"
> INSERT OVERWRITE TABLE tgt PARTITION(pcol="2009-08-02") SELECT * WHERE
> ts = "2009-08-02"

--------------------------------------------------------

In my case src too is partitioned by "ts", which means that two mappings should take place at the same time since the data is independant, but Hive (0.3) produces a linear partition-by-partition job sequence.
I also do group by inside every insert...
Any ideas?

[this, together with the fact that hive --service thriftserver (at least in 0.3) doesn't support multiple clients, makes it very hard to effectively run some queries.
--
Andraz Tori, CTO
Zemanta Ltd, New York, London, Ljubljana www.zemanta.com
mail: [EMAIL PROTECTED]
tel: +386 41 515 767
´╗┐twitter: andraz, skype: minmax_test