Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Reduce Tasks


Copy link to this message
-
Re: Reduce Tasks
Setting that mapred.reduce.tasks won't work as Pig overrides.  See http://pig.apache.org/docs/r0.10.0/perf.html#parallel for info on how to set the number of reducers in Pig.

Alan.

On Feb 1, 2013, at 4:53 PM, Mohit Anchlia wrote:

> Just slightly different problem I tried setting SET mapred.reduce.tasks to
> 200 in pig but still more tasks were launched for that job. Is there any
> other way to set the parameter?
>
> On Fri, Feb 1, 2013 at 3:15 PM, Harsha <[EMAIL PROTECTED]> wrote:
>
>>
>> its the total number of reducers not active reducers.
>> If you specify lower number  each reducer gets more data to process.
>> --
>> Harsha
>>
>>
>> On Friday, February 1, 2013 at 2:54 PM, Mohit Anchlia wrote:
>>
>>> Thanks! Is there a downside of reducing number of reducers? I am trying
>> to
>>> alleviate high CPU.
>>>
>>> With low reducers using parallel clause does it mean that more data is
>>> processed by each reducer or does it mean how many reducers can be active
>>> at one time
>>>
>>> On Fri, Feb 1, 2013 at 2:44 PM, Harsha <[EMAIL PROTECTED] (mailto:
>> [EMAIL PROTECTED])> wrote:
>>>
>>>> Mohit,
>>>> you can use PARALLEL clause to specify reduce tasks. More info here
>>>>
>> http://pig.apache.org/docs/r0.8.1/cookbook.html#Use+the+Parallel+Features
>>>>
>>>> --
>>>> Harsha
>>>>
>>>>
>>>> On Friday, February 1, 2013 at 2:42 PM, Mohit Anchlia wrote:
>>>>
>>>>> Is there a way to specify max number of reduce tasks that a job
>> should
>>>> span
>>>>> in pig script without having to restart the cluster?
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>>