Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Question on MapReduce


Copy link to this message
-
RE: Question on MapReduce

This maybe dated materials.

Cloudera and HDP folks please correct with updates :)

http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-basic-hardware-recommendations/
http://www.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/

http://hortonworks.com/blog/best-practices-for-selecting-apache-hadoop-hardware/

Hope this helps.

-----Original Message-----
From: Satheesh Kumar [mailto:[EMAIL PROTECTED]]
Sent: Friday, May 11, 2012 12:48 PM
To: [EMAIL PROTECTED]
Subject: Re: Question on MapReduce

Thanks, Leo. What is the config of a typical data node in a Hadoop cluster
- cores, storage capacity, and connectivity (SATA?).? How many tasktrackers scheduled per core in general?

Is there a best practices guide somewhere?

Thanks,
Satheesh

On Fri, May 11, 2012 at 10:48 AM, Leo Leung <[EMAIL PROTECTED]> wrote:

> Nope, you must tune the config on that specific super node to have
> more M/R slots (this is for 1.0.x) This does not mean the JobTracker
> will be eager to stuff that super node with all the M/R jobs at hand.
>
> It still goes through the scheduler,  Capacity Scheduler is most
> likely what you have.  (check your config)
>
> IMO, If the data locality is not going to be there, your cluster is
> going to suffer from Network I/O.
>
>
> -----Original Message-----
> From: Satheesh Kumar [mailto:[EMAIL PROTECTED]]
> Sent: Friday, May 11, 2012 9:51 AM
> To: [EMAIL PROTECTED]
> Subject: Question on MapReduce
>
> Hi,
>
> I am a newbie on Hadoop and have a quick question on optimal compute vs.
> storage resources for MapReduce.
>
> If I have a multiprocessor node with 4 processors, will Hadoop
> schedule higher number of Map or Reduce tasks on the system than on a
> uni-processor system? In other words, does Hadoop detect denser
> systems and schedule denser tasks on multiprocessor systems?
>
> If yes, will that imply that it makes sense to attach higher capacity
> storage to store more number of blocks on systems with dense compute?
>
> Any insights will be very useful.
>
> Thanks,
> Satheesh
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB