-RE: Question on MapReduce
Leo Leung 2012-05-11, 19:58
This maybe dated materials.
Cloudera and HDP folks please correct with updates :)
Hope this helps.
From: Satheesh Kumar [mailto:[EMAIL PROTECTED]]
Sent: Friday, May 11, 2012 12:48 PM
To: [EMAIL PROTECTED]
Subject: Re: Question on MapReduce
Thanks, Leo. What is the config of a typical data node in a Hadoop cluster
- cores, storage capacity, and connectivity (SATA?).? How many tasktrackers scheduled per core in general?
Is there a best practices guide somewhere?
On Fri, May 11, 2012 at 10:48 AM, Leo Leung <[EMAIL PROTECTED]> wrote:
> Nope, you must tune the config on that specific super node to have
> more M/R slots (this is for 1.0.x) This does not mean the JobTracker
> will be eager to stuff that super node with all the M/R jobs at hand.
> It still goes through the scheduler, Capacity Scheduler is most
> likely what you have. (check your config)
> IMO, If the data locality is not going to be there, your cluster is
> going to suffer from Network I/O.
> -----Original Message-----
> From: Satheesh Kumar [mailto:[EMAIL PROTECTED]]
> Sent: Friday, May 11, 2012 9:51 AM
> To: [EMAIL PROTECTED]
> Subject: Question on MapReduce
> I am a newbie on Hadoop and have a quick question on optimal compute vs.
> storage resources for MapReduce.
> If I have a multiprocessor node with 4 processors, will Hadoop
> schedule higher number of Map or Reduce tasks on the system than on a
> uni-processor system? In other words, does Hadoop detect denser
> systems and schedule denser tasks on multiprocessor systems?
> If yes, will that imply that it makes sense to attach higher capacity
> storage to store more number of blocks on systems with dense compute?
> Any insights will be very useful.