Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> why insert overwrite table tmp partition(dt=1) select bar, foo from pokes NEEDS 2 MR JOBS?


Copy link to this message
-
RE: why insert overwrite table tmp partition(dt=1) select bar, foo from pokes NEEDS 2 MR JOBS?
Not sure if this got answered. The second MR job in this case is for concatenating the outputs so that the files generated are much less than the mapper parallelism. This has advantages for jobs that consume the data. This feature was added recently. You can however turn it off using the following configuration variable.

hive.merge.mapfiles=false

This is true by default.

Ashish
________________________________
From: Min Zhou [mailto:[EMAIL PROTECTED]]
Sent: Monday, August 03, 2009 8:02 PM
To: hive-user
Subject: why insert overwrite table tmp partition(dt=1) select bar, foo from pokes NEEDS 2 MR JOBS?

I thought one map only job is ok. try
hive> explain insert overwrite table tmp partition(dt=1) select bar, foo from pokes;
Thanks,
Min
--
My research interests are distributed systems, parallel computing and bytecode based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com