Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - Hive Query Unable to distribute load evenly in reducers


+
Saurabh Mishra 2012-10-15, 12:09
+
MiaoMiao 2012-10-15, 13:10
+
Saurabh Mishra 2012-10-15, 14:23
+
Philip Tromans 2012-10-15, 15:29
+
Saurabh Mishra 2012-10-15, 20:45
+
Navis류승우 2012-10-16, 05:17
+
Saurabh Mishra 2012-10-16, 05:53
+
Saurabh Mishra 2012-10-18, 08:56
Copy link to this message
-
Re: Hive Query Unable to distribute load evenly in reducers
Philip Tromans 2012-10-18, 09:03
I'm really not convinced that there's no skew in your data. Look at
the counters from the Hadoop TaskTracker pages, and thoroughly check
that the numbers of reducer input records / groups and output records
are all similar.

Phil.

On 18 October 2012 09:56, Saurabh Mishra <[EMAIL PROTECTED]> wrote:
> any views on the problem
>
> ________________________________
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
> Subject: RE: Hive Query Unable to distribute load evenly in reducers
> Date: Tue, 16 Oct 2012 11:23:29 +0530
>
>
> by using mapjoin if you are implying setting
> set hive.auto.convert.join=true;
> then this configuration i am already using, but to no avail...:(
>
> ________________________________
> Date: Tue, 16 Oct 2012 14:17:47 +0900
> Subject: Re: Hive Query Unable to distribute load evenly in reducers
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
>
> How about using MapJoin?
>
> 2012/10/16 Saurabh Mishra <[EMAIL PROTECTED]>
>
> no there is apparently no heavy skewing. also another stats i wanted to
> point was, following is approximate table contents in this 4 table join
> query :
> tableA : 170 million (actual number, + i am also exploding these records, so
> the number could be much much higher)
> tableB:15
> tableC:45
> tableD:45
> tableE : 45
> tableF  : 14000
>
> Also i cannot put any filter condition on tableA ,situation does not permit
> so. :(
> Kindly suggest, some alternative solution or some hive configuration to
> better load distribute in the reducers
>
>> Date: Mon, 15 Oct 2012 16:29:56 +0100
>
>> Subject: Re: Hive Query Unable to distribute load evenly in reducers
>> From: [EMAIL PROTECTED]
>> To: [EMAIL PROTECTED]
>
>>
>> Is your data heavily skewed towards certain values of a.x etc?
>>
>> On 15 October 2012 15:23, Saurabh Mishra <[EMAIL PROTECTED]>
>> wrote:
>> > The queries are simple joins, something on the lines of
>> > select a, b, c, count(D) from tableA join tableB on a.x=b.y join....
>> > group
>> > by a, b,c;
>> >
>> >
>> >> From: [EMAIL PROTECTED]
>> >> Date: Mon, 15 Oct 2012 21:10:39 +0800
>> >> Subject: Re: Hive Query Unable to distribute load evenly in reducers
>> >> To: [EMAIL PROTECTED]
>> >
>> >>
>> >> And your queries were?
>> >>
>> >> On Mon, Oct 15, 2012 at 8:09 PM, Saurabh Mishra
>> >> <[EMAIL PROTECTED]> wrote:
>> >> > Hi,
>> >> > I am firing some hive queries joining tables containing upto
>> >> > 30millions
>> >> > records each. Since the load on the reducers is very significant in
>> >> > these
>> >> > cases, i specifically set the following parameters before executing
>> >> > the
>> >> > queries :
>> >> >
>> >> > set mapred.reduce.tasks=100;
>> >> > set hive.exec.reducers.bytes.per.reducer=500000000;
>> >> > set hive.optimize.cp=true;
>> >> >
>> >> > The number of reducer the job spouts in now 160, but despite the high
>> >> > number
>> >> > most of the load remains upon 1 or 2 reducers. Hence in the final
>> >> > statistics, 158 reducers go completed with 2-3 minutes of start and 2
>> >> > reducers took 2 hrs to run.
>> >> > Is there any way to overcome this load distribution disparity.
>> >> > Any help in this regards will be highly appreciated.
>> >> >
>> >> > Sincerely
>> >> > Saurabh Mishra
>
>