Yeah it will increase performance by reducing number of mappers and making
single mapper to use more memory . So the value depends upon the application
and RAM available . For your use case i think 512MB- 1GB will be better
On Tue, Jun 21, 2011 at 4:28 PM, Avi Vaknin <[EMAIL PROTECTED]> wrote:
> The block size is configured to 128MB, I've read that it is recommended to
> increase it in order to get better performance.
> What value do you recommend to set it ?
> -----Original Message-----
> From: madhu phatak [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, June 21, 2011 12:54 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Help with adjusting Hadoop configuration files
> If u reduce the default block size of dfs(which is in the configuration
> file) and if u use default inputformat it creates more no of mappers at a
> time which may help you to effectively use the RAM.. Another way is create
> as many parallel jobs as possible at pro grammatically so that uses all
> available RAM.
> On Tue, Jun 21, 2011 at 3:17 PM, Avi Vaknin <[EMAIL PROTECTED]> wrote:
> > Hi Madhu,
> > First of all, thanks for the quick reply.
> > I've searched the net about the properties of the configuration files and
> > specifically wanted to know if there is
> > a property that is related to memory tuning (as you can see I have 7.5
> > on each datanode and I really want to use it properly).
> > Also, I've changed the mapred.tasktracker.reduce/map.tasks.maximum to 10
> > (number of cores on the datanodes) and unfortunately I haven't seen any
> > change on the performance or time duration of running jobs.
> > Avi
> > -----Original Message-----
> > From: madhu phatak [mailto:[EMAIL PROTECTED]]
> > Sent: Tuesday, June 21, 2011 12:33 PM
> > To: [EMAIL PROTECTED]
> > Subject: Re: Help with adjusting Hadoop configuration files
> > The utilization of cluster depends upon the no of jobs and no of mappers
> > and
> > reducers.The configuration files only help u set up the cluster by
> > specifying info .u can also specify some of details like block size and
> > replication in configuration files which may help you in job
> > management.You
> > can read all the available configuration properties here
> > http://hadoop.apache.org/common/docs/current/cluster_setup.html
> > On Tue, Jun 21, 2011 at 2:13 PM, Avi Vaknin <[EMAIL PROTECTED]>
> > > Hi Everyone,
> > > We are a start-up company has been using the Hadoop Cluster platform
> > > (version 0.20.2) on Amazon EC2 environment.
> > > We tried to setup a cluster using two different forms:
> > > Cluster 1: includes 1 master (namenode) + 5 datanodes - all of the
> > machines
> > > are small EC2 instances (1.6 GB RAM)
> > > Cluster 2: includes 1 master (namenode) + 2 datanodes - the master is a
> > > small EC2 instance and the other two datanodes are large EC2 instances
> > (7.5
> > > GB RAM)
> > > We tried to make changes on the the configuration files (core-sit,
> > > hdfs-site
> > > and mapred-sit xml files) and we expected to see a significant
> > improvement
> > > on the performance of the cluster 2,
> > > unfortunately this has yet to happen.
> > >
> > > Are there any special parameters on the configuration files that we
> > to
> > > change in order to adjust the Hadoop to a large hardware environment ?
> > > Are there any best practice you recommend?
> > >
> > > Thanks in advance.
> > >
> > > Avi
> > >
> > >
> > >
> > >
> > -----
> > No virus found in this message.
> > Checked by AVG - www.avg.com
> > Version: 10.0.1382 / Virus Database: 1513/3707 - Release Date: 06/16/11
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1382 / Virus Database: 1513/3707 - Release Date: 06/16/11