Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Multi Table Inserts produces multiple jobs


Copy link to this message
-
Re: Multi Table Inserts produces multiple jobs
Hi Cristi,

The source_table is scanned only once in a multi-insert scenario, whereas if u have 2 queries it will be scanned twice.

If you do an 'explain extended' on the query you would know the flow of data.

You could find related info @ http://www.slideshare.net/ragho/hive-user-meeting-august-2009-facebook - Slides 51-53.

-Thiruvel

On Aug 24, 2010, at 9:18 PM, Cristi Cioriia wrote:

> Hi guys,
>
> I would like to use the Multi Insert feature of HIVE so that I could
> have fewer map-reduce jobs than running separate queries.
>
> I have some HIVE queries that use the Multi Insert feature as below:
>
> FROM source_table
> INSERT OVERWRITE TABLE tablename1
> SELECT field1, field2 ...fieldN
> GROUP BY field1, field2
> INSERT OVERWRITE TABLE tablename2
> SELECT field1,  field3 ... fieldK
> GROUP BY field1, field3
>
> I was hoping that by using this feature only 1 Map-Reduce job will be
> created, but what I found out when running the query is that 2  jobs are
> created, just as if I would have ran 2 separate queries:
>
> FROM source_table
> INSERT OVERWRITE TABLE tablename1
> SELECT field1, field2 ...fieldN
> GROUP BY field1, field2
>
> FROM source_table
> INSERT OVERWRITE TABLE tablename1
> SELECT field1,  field3 ... fieldK
> GROUP BY field1, field3
>
> Is there any way that I can get only 1 MR job with the multi insert
> syntax?
>
> Thanks,
> Cristi
>
>
>
>
>