Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - guessing number of reducers.


Copy link to this message
-
Re: guessing number of reducers.
Manoj Babu 2012-11-22, 04:45
Thank you for the info Bejoy.

Cheers!
Manoj.

On Thu, Nov 22, 2012 at 12:04 AM, Bejoy KS <[EMAIL PROTECTED]> wrote:

> **
> Hi Manoj
>
> If you intend to calculate the number of reducers based on the input size,
> then in your driver class you should get the size of the input dir in hdfs
> and say you intended to give n bytes to a reducer then the number of
> reducers can be computed as
> Total input size/ bytes per reducer.
>
> You can round this value and use it to set the number of reducers in conf
> programatically.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ------------------------------
> *From: * Manoj Babu <[EMAIL PROTECTED]>
> *Date: *Wed, 21 Nov 2012 23:28:00 +0530
> *To: *<[EMAIL PROTECTED]>
> *Cc: *[EMAIL PROTECTED]<[EMAIL PROTECTED]>
> *Subject: *Re: guessing number of reducers.
>
> Hi,
>
> How to set no of reducers in job conf dynamically?
> For example some days i am getting 500GB of data on heavy traffic and some
> days 100GB only.
>
> Thanks in advance!
>
> Cheers!
> Manoj.
>
>
>
> On Wed, Nov 21, 2012 at 11:19 PM, Kartashov, Andy <[EMAIL PROTECTED]>wrote:
>
>>  Bejoy,
>>
>>
>>
>> I’ve read somethere about keeping number of mapred.reduce.tasks below the
>> reduce task capcity. Here is what I just tested:
>>
>>
>>
>> Output 25Gb. 8DN cluster with 16 Map and Reduce Task Capacity:
>>
>>
>>
>> 1 Reducer   – 22mins
>>
>> 4 Reducers – 11.5mins
>>
>> 8 Reducers – 5mins
>>
>> 10 Reducers – 7mins
>>
>> 12 Reducers – 6:5mins
>>
>> 16 Reducers – 5.5mins
>>
>>
>>
>> 8 Reducers have won the race. But Reducers at the max capacity was very
>> clos. J
>>
>>
>>
>> AK47
>>
>>
>>
>>
>>
>> *From:* Bejoy KS [mailto:[EMAIL PROTECTED]]
>> *Sent:* Wednesday, November 21, 2012 11:51 AM
>> *To:* [EMAIL PROTECTED]
>> *Subject:* Re: guessing number of reducers.
>>
>>
>>
>> Hi Sasha
>>
>> In general the number of reduce tasks is chosen mainly based on the data
>> volume to reduce phase. In tools like hive and pig by default for every 1GB
>> of map output there will be a reducer. So if you have 100 gigs of map
>> output then 100 reducers.
>> If your tasks are more CPU intensive then you need lesser volume of data
>> per reducer for better performance results.
>>
>> In general it is better to have the number of reduce tasks slightly less
>> than the number of available reduce slots in the cluster.
>>
>> Regards
>> Bejoy KS
>>
>> Sent from handheld, please excuse typos.
>>  ------------------------------
>>
>> *From: *jamal sasha <[EMAIL PROTECTED]>
>>
>> *Date: *Wed, 21 Nov 2012 11:38:38 -0500
>>
>> *To: *[EMAIL PROTECTED]<[EMAIL PROTECTED]>
>>
>> *ReplyTo: *[EMAIL PROTECTED]
>>
>> *Subject: *guessing number of reducers.
>>
>>
>>
>> By default the number of reducers is set to 1..
>> Is there a good way to guess optimal number of reducers....
>> Or let's say i have tbs worth of data... mappers are of order 5000 or
>> so...
>> But ultimately i am calculating , let's say, some average of whole
>> data... say average transaction occurring...
>> Now the output will be just one line in one "part"... rest of them will
>> be empty.So i am guessing i need loads of reducers but then most of them
>> will be empty but at the same time one reducer won't suffice..
>> What's the best way to solve this..
>> How to guess optimal number of reducers..
>> Thanks
>>  NOTICE: This e-mail message and any attachments are confidential,
>> subject to copyright and may be privileged. Any unauthorized use, copying
>> or disclosure is prohibited. If you are not the intended recipient, please
>> delete and contact the sender immediately. Please consider the environment
>> before printing this e-mail. AVIS : le présent courriel et toute pièce
>> jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur
>> et peuvent être couverts par le secret professionnel. Toute utilisation,
>> copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le
>> destinataire prévu de ce courriel, supprimez-le et contactez immédiatement