Saurabh Mishra 2012-10-15, 12:09
MiaoMiao 2012-10-15, 13:10
Saurabh Mishra 2012-10-15, 14:23
Is your data heavily skewed towards certain values of a.x etc?
On 15 October 2012 15:23, Saurabh Mishra <[EMAIL PROTECTED]> wrote:
> The queries are simple joins, something on the lines of
> select a, b, c, count(D) from tableA join tableB on a.x=b.y join.... group
> by a, b,c;
>> From: [EMAIL PROTECTED]
>> Date: Mon, 15 Oct 2012 21:10:39 +0800
>> Subject: Re: Hive Query Unable to distribute load evenly in reducers
>> To: [EMAIL PROTECTED]
>> And your queries were?
>> On Mon, Oct 15, 2012 at 8:09 PM, Saurabh Mishra
>> <[EMAIL PROTECTED]> wrote:
>> > Hi,
>> > I am firing some hive queries joining tables containing upto 30millions
>> > records each. Since the load on the reducers is very significant in
>> > these
>> > cases, i specifically set the following parameters before executing the
>> > queries :
>> > set mapred.reduce.tasks=100;
>> > set hive.exec.reducers.bytes.per.reducer=500000000;
>> > set hive.optimize.cp=true;
>> > The number of reducer the job spouts in now 160, but despite the high
>> > number
>> > most of the load remains upon 1 or 2 reducers. Hence in the final
>> > statistics, 158 reducers go completed with 2-3 minutes of start and 2
>> > reducers took 2 hrs to run.
>> > Is there any way to overcome this load distribution disparity.
>> > Any help in this regards will be highly appreciated.
>> > Sincerely
>> > Saurabh Mishra
Saurabh Mishra 2012-10-15, 20:45
Navis류승우 2012-10-16, 05:17
Saurabh Mishra 2012-10-16, 05:53
Saurabh Mishra 2012-10-18, 08:56
Philip Tromans 2012-10-18, 09:03