Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Hive Query Unable to distribute load evenly in reducers


Copy link to this message
-
Re: Hive Query Unable to distribute load evenly in reducers
Is your data heavily skewed towards certain values of a.x etc?

On 15 October 2012 15:23, Saurabh Mishra <[EMAIL PROTECTED]> wrote:
> The queries are simple joins, something on the lines of
> select a, b, c, count(D) from tableA join tableB on a.x=b.y join.... group
> by a, b,c;
>
>
>> From: [EMAIL PROTECTED]
>> Date: Mon, 15 Oct 2012 21:10:39 +0800
>> Subject: Re: Hive Query Unable to distribute load evenly in reducers
>> To: [EMAIL PROTECTED]
>
>>
>> And your queries were?
>>
>> On Mon, Oct 15, 2012 at 8:09 PM, Saurabh Mishra
>> <[EMAIL PROTECTED]> wrote:
>> > Hi,
>> > I am firing some hive queries joining tables containing upto 30millions
>> > records each. Since the load on the reducers is very significant in
>> > these
>> > cases, i specifically set the following parameters before executing the
>> > queries :
>> >
>> > set mapred.reduce.tasks=100;
>> > set hive.exec.reducers.bytes.per.reducer=500000000;
>> > set hive.optimize.cp=true;
>> >
>> > The number of reducer the job spouts in now 160, but despite the high
>> > number
>> > most of the load remains upon 1 or 2 reducers. Hence in the final
>> > statistics, 158 reducers go completed with 2-3 minutes of start and 2
>> > reducers took 2 hrs to run.
>> > Is there any way to overcome this load distribution disparity.
>> > Any help in this regards will be highly appreciated.
>> >
>> > Sincerely
>> > Saurabh Mishra
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB