Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Multi-GroupBy-Insert optimization


Copy link to this message
-
Re: Multi-GroupBy-Insert optimization
> On Fri, Jun 1, 2012 at 5:25 PM, shan s <[EMAIL PROTECTED]> wrote:
>
>> I am using Multi-GroupBy-Insert. I was expecting a single map-reduce job
>> which would club the group-bys together.
>> However it is scheduling n jobs where n = number of group bys..
>> Could you please explain this behaviour.
>>
>>
>
No, it will result in at least as many jobs as there is group-bys. The
efficiency is hidden not in lowering number of jobs, but in fact that the
first job usually reduces the amount of the data that the rest needs to go
through. E.g. if the FROM clause includes subquery or when the group-bys
have similar WHERE caluses - then this "pre-selection" is executed first
and the subsequent jobs operate on the results of the first instead of
entire table/partition and are therefore much faster.
J. Dolinar