Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> guessing number of reducers.


+
jamal sasha 2012-11-21, 16:38
+
Bejoy KS 2012-11-21, 16:50
+
Kartashov, Andy 2012-11-21, 17:49
+
Bejoy KS 2012-11-21, 18:21
+
jamal sasha 2012-11-21, 18:27
+
Mohammad Tariq 2012-11-21, 18:04
+
Manoj Babu 2012-11-21, 17:58
Copy link to this message
-
Re: guessing number of reducers.
Hi Manoj

If you intend to calculate the number of reducers based on the input size, then in your driver class you should get the size of the input dir in hdfs and  say you intended to give n bytes to a reducer then the number of reducers can be computed as
Total input size/ bytes per reducer.

You can round this value and use it to set the number of reducers in conf programatically.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Manoj Babu <[EMAIL PROTECTED]>
Date: Wed, 21 Nov 2012 23:28:00
To: <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]<[EMAIL PROTECTED]>
Subject: Re: guessing number of reducers.

Hi,

How to set no of reducers in job conf dynamically?
For example some days i am getting 500GB of data on heavy traffic and some
days 100GB only.

Thanks in advance!

Cheers!
Manoj.

On Wed, Nov 21, 2012 at 11:19 PM, Kartashov, Andy <[EMAIL PROTECTED]>wrote:

>  Bejoy,
>
>
>
> I’ve read somethere about keeping number of mapred.reduce.tasks below the
> reduce task capcity. Here is what I just tested:
>
>
>
> Output 25Gb. 8DN cluster with 16 Map and Reduce Task Capacity:
>
>
>
> 1 Reducer   – 22mins
>
> 4 Reducers – 11.5mins
>
> 8 Reducers – 5mins
>
> 10 Reducers – 7mins
>
> 12 Reducers – 6:5mins
>
> 16 Reducers – 5.5mins
>
>
>
> 8 Reducers have won the race. But Reducers at the max capacity was very
> clos. J
>
>
>
> AK47
>
>
>
>
>
> *From:* Bejoy KS [mailto:[EMAIL PROTECTED]]
> *Sent:* Wednesday, November 21, 2012 11:51 AM
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: guessing number of reducers.
>
>
>
> Hi Sasha
>
> In general the number of reduce tasks is chosen mainly based on the data
> volume to reduce phase. In tools like hive and pig by default for every 1GB
> of map output there will be a reducer. So if you have 100 gigs of map
> output then 100 reducers.
> If your tasks are more CPU intensive then you need lesser volume of data
> per reducer for better performance results.
>
> In general it is better to have the number of reduce tasks slightly less
> than the number of available reduce slots in the cluster.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>  ------------------------------
>
> *From: *jamal sasha <[EMAIL PROTECTED]>
>
> *Date: *Wed, 21 Nov 2012 11:38:38 -0500
>
> *To: *[EMAIL PROTECTED]<[EMAIL PROTECTED]>
>
> *ReplyTo: *[EMAIL PROTECTED]
>
> *Subject: *guessing number of reducers.
>
>
>
> By default the number of reducers is set to 1..
> Is there a good way to guess optimal number of reducers....
> Or let's say i have tbs worth of data... mappers are of order 5000 or so...
> But ultimately i am calculating , let's say, some average of whole data...
> say average transaction occurring...
> Now the output will be just one line in one "part"... rest of them will be
> empty.So i am guessing i need loads of reducers but then most of them will
> be empty but at the same time one reducer won't suffice..
> What's the best way to solve this..
> How to guess optimal number of reducers..
> Thanks
>  NOTICE: This e-mail message and any attachments are confidential, subject
> to copyright and may be privileged. Any unauthorized use, copying or
> disclosure is prohibited. If you are not the intended recipient, please
> delete and contact the sender immediately. Please consider the environment
> before printing this e-mail. AVIS : le présent courriel et toute pièce
> jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur
> et peuvent être couverts par le secret professionnel. Toute utilisation,
> copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le
> destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
> l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent
> courriel
>

+
Manoj Babu 2012-11-22, 04:45
+
Kartashov, Andy 2012-11-21, 16:43