Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> guessing number of reducers.


+
jamal sasha 2012-11-21, 16:38
+
Bejoy KS 2012-11-21, 16:50
+
Kartashov, Andy 2012-11-21, 17:49
+
Bejoy KS 2012-11-21, 18:21
+
jamal sasha 2012-11-21, 18:27
+
Mohammad Tariq 2012-11-21, 18:04
Copy link to this message
-
Re: guessing number of reducers.
Hi,

How to set no of reducers in job conf dynamically?
For example some days i am getting 500GB of data on heavy traffic and some
days 100GB only.

Thanks in advance!

Cheers!
Manoj.

On Wed, Nov 21, 2012 at 11:19 PM, Kartashov, Andy <[EMAIL PROTECTED]>wrote:

>  Bejoy,
>
>
>
> I’ve read somethere about keeping number of mapred.reduce.tasks below the
> reduce task capcity. Here is what I just tested:
>
>
>
> Output 25Gb. 8DN cluster with 16 Map and Reduce Task Capacity:
>
>
>
> 1 Reducer   – 22mins
>
> 4 Reducers – 11.5mins
>
> 8 Reducers – 5mins
>
> 10 Reducers – 7mins
>
> 12 Reducers – 6:5mins
>
> 16 Reducers – 5.5mins
>
>
>
> 8 Reducers have won the race. But Reducers at the max capacity was very
> clos. J
>
>
>
> AK47
>
>
>
>
>
> *From:* Bejoy KS [mailto:[EMAIL PROTECTED]]
> *Sent:* Wednesday, November 21, 2012 11:51 AM
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: guessing number of reducers.
>
>
>
> Hi Sasha
>
> In general the number of reduce tasks is chosen mainly based on the data
> volume to reduce phase. In tools like hive and pig by default for every 1GB
> of map output there will be a reducer. So if you have 100 gigs of map
> output then 100 reducers.
> If your tasks are more CPU intensive then you need lesser volume of data
> per reducer for better performance results.
>
> In general it is better to have the number of reduce tasks slightly less
> than the number of available reduce slots in the cluster.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>  ------------------------------
>
> *From: *jamal sasha <[EMAIL PROTECTED]>
>
> *Date: *Wed, 21 Nov 2012 11:38:38 -0500
>
> *To: *[EMAIL PROTECTED]<[EMAIL PROTECTED]>
>
> *ReplyTo: *[EMAIL PROTECTED]
>
> *Subject: *guessing number of reducers.
>
>
>
> By default the number of reducers is set to 1..
> Is there a good way to guess optimal number of reducers....
> Or let's say i have tbs worth of data... mappers are of order 5000 or so...
> But ultimately i am calculating , let's say, some average of whole data...
> say average transaction occurring...
> Now the output will be just one line in one "part"... rest of them will be
> empty.So i am guessing i need loads of reducers but then most of them will
> be empty but at the same time one reducer won't suffice..
> What's the best way to solve this..
> How to guess optimal number of reducers..
> Thanks
>  NOTICE: This e-mail message and any attachments are confidential, subject
> to copyright and may be privileged. Any unauthorized use, copying or
> disclosure is prohibited. If you are not the intended recipient, please
> delete and contact the sender immediately. Please consider the environment
> before printing this e-mail. AVIS : le présent courriel et toute pièce
> jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur
> et peuvent être couverts par le secret professionnel. Toute utilisation,
> copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le
> destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
> l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent
> courriel
>
+
Bejoy KS 2012-11-21, 18:34
+
Manoj Babu 2012-11-22, 04:45
+
Kartashov, Andy 2012-11-21, 16:43