|
|
-
Re: How to lower the total number of map tasksRomedius Weiss 2012-10-03, 04:00
Hi!
According to the article @YDN* "The on-node parallelism is controlled by the mapred.tasktracker.map.tasks.maximum parameter." [http://developer.yahoo.com/hadoop/tutorial/module4.html] Also i think its better to set the min size instead of teh max size, so the algorithm tries to slice the file in chunks of a certian minimal size. Have you tried to make a custom InputFormat? Might be another more drastic solution. Cheers, R Zitat von Shing Hing Man <[EMAIL PROTECTED]>: > I only have one big input file. > > Shing > > > ________________________________ > From: Bejoy KS <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED]; Shing Hing Man <[EMAIL PROTECTED]> > Sent: Tuesday, October 2, 2012 6:46 PM > Subject: Re: How to lower the total number of map tasks > > > Hi Shing > > Is your input a single file or set of small files? If latter you > need to use CombineFileInputFormat. > > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > ________________________________ > > From: Shing Hing Man <[EMAIL PROTECTED]> > Date: Tue, 2 Oct 2012 10:38:59 -0700 (PDT) > To: [EMAIL PROTECTED]<[EMAIL PROTECTED]> > ReplyTo: [EMAIL PROTECTED] > Subject: Re: How to lower the total number of map tasks > > > I have tried > Configuration.setInt("mapred.max.split.size",134217728); > > and setting mapred.max.split.size in mapred-site.xml. ( > dfs.block.size is left unchanged at 67108864). > > But in the job.xml, I am still getting mapred.map.tasks =242 . > > Shing > > > > > > > ________________________________ > From: Bejoy Ks <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED]; Shing Hing Man <[EMAIL PROTECTED]> > Sent: Tuesday, October 2, 2012 6:03 PM > Subject: Re: How to lower the total number of map tasks > > > Sorry for the typo, the property name is mapred.max.split.size > > Also just for changing the number of map tasks you don't need to > modify the hdfs block size. > > > On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <[EMAIL PROTECTED]> wrote: > > Hi >> >> >> You need to alter the value of mapred.max.split size to a value >> larger than your block size to have less number of map tasks than >> the default. >> >> >> >> On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <[EMAIL PROTECTED]> wrote: >> >> >>> >>> >>> I am running Hadoop 1.0.3 in Pseudo distributed mode. >>> When I submit a map/reduce job to process a file of size about >>> 16 GB, in job.xml, I have the following >>> >>> >>> mapred.map.tasks =242 >>> mapred.min.split.size =0 >>> dfs.block.size = 67108864 >>> >>> >>> I would like to reduce mapred.map.tasks to see if it improves >>> performance. >>> I have tried doubling the size of dfs.block.size. But >>> the mapred.map.tasks remains unchanged. >>> Is there a way to reduce mapred.map.tasks ? >>> >>> >>> Thanks in advance for any assistance ! >>> Shing >>> >>> >> +
Shing Hing Man 2012-10-03, 13:50
|