Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Hive Query Unable to distribute load evenly in reducers


+
Saurabh Mishra 2012-10-15, 12:09
+
MiaoMiao 2012-10-15, 13:10
+
Saurabh Mishra 2012-10-15, 14:23
+
Philip Tromans 2012-10-15, 15:29
+
Saurabh Mishra 2012-10-15, 20:45
+
Navis류승우 2012-10-16, 05:17
+
Saurabh Mishra 2012-10-16, 05:53
+
Saurabh Mishra 2012-10-18, 08:56
Copy link to this message
-
Re: Hive Query Unable to distribute load evenly in reducers
I'm really not convinced that there's no skew in your data. Look at
the counters from the Hadoop TaskTracker pages, and thoroughly check
that the numbers of reducer input records / groups and output records
are all similar.

Phil.

On 18 October 2012 09:56, Saurabh Mishra <[EMAIL PROTECTED]> wrote:
> any views on the problem
>
> ________________________________
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
> Subject: RE: Hive Query Unable to distribute load evenly in reducers
> Date: Tue, 16 Oct 2012 11:23:29 +0530
>
>
> by using mapjoin if you are implying setting
> set hive.auto.convert.join=true;
> then this configuration i am already using, but to no avail...:(
>
> ________________________________
> Date: Tue, 16 Oct 2012 14:17:47 +0900
> Subject: Re: Hive Query Unable to distribute load evenly in reducers
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
>
> How about using MapJoin?
>
> 2012/10/16 Saurabh Mishra <[EMAIL PROTECTED]>
>
> no there is apparently no heavy skewing. also another stats i wanted to
> point was, following is approximate table contents in this 4 table join
> query :
> tableA : 170 million (actual number, + i am also exploding these records, so
> the number could be much much higher)
> tableB:15
> tableC:45
> tableD:45
> tableE : 45
> tableF  : 14000
>
> Also i cannot put any filter condition on tableA ,situation does not permit
> so. :(
> Kindly suggest, some alternative solution or some hive configuration to
> better load distribute in the reducers
>
>> Date: Mon, 15 Oct 2012 16:29:56 +0100
>
>> Subject: Re: Hive Query Unable to distribute load evenly in reducers
>> From: [EMAIL PROTECTED]
>> To: [EMAIL PROTECTED]
>
>>
>> Is your data heavily skewed towards certain values of a.x etc?
>>
>> On 15 October 2012 15:23, Saurabh Mishra <[EMAIL PROTECTED]>
>> wrote:
>> > The queries are simple joins, something on the lines of
>> > select a, b, c, count(D) from tableA join tableB on a.x=b.y join....
>> > group
>> > by a, b,c;
>> >
>> >
>> >> From: [EMAIL PROTECTED]
>> >> Date: Mon, 15 Oct 2012 21:10:39 +0800
>> >> Subject: Re: Hive Query Unable to distribute load evenly in reducers
>> >> To: [EMAIL PROTECTED]
>> >
>> >>
>> >> And your queries were?
>> >>
>> >> On Mon, Oct 15, 2012 at 8:09 PM, Saurabh Mishra
>> >> <[EMAIL PROTECTED]> wrote:
>> >> > Hi,
>> >> > I am firing some hive queries joining tables containing upto
>> >> > 30millions
>> >> > records each. Since the load on the reducers is very significant in
>> >> > these
>> >> > cases, i specifically set the following parameters before executing
>> >> > the
>> >> > queries :
>> >> >
>> >> > set mapred.reduce.tasks=100;
>> >> > set hive.exec.reducers.bytes.per.reducer=500000000;
>> >> > set hive.optimize.cp=true;
>> >> >
>> >> > The number of reducer the job spouts in now 160, but despite the high
>> >> > number
>> >> > most of the load remains upon 1 or 2 reducers. Hence in the final
>> >> > statistics, 158 reducers go completed with 2-3 minutes of start and 2
>> >> > reducers took 2 hrs to run.
>> >> > Is there any way to overcome this load distribution disparity.
>> >> > Any help in this regards will be highly appreciated.
>> >> >
>> >> > Sincerely
>> >> > Saurabh Mishra
>
>