|
|
-
How to lower the total number of map tasks
Shing Hing Man 2012-10-02, 16:34
I am running Hadoop 1.0.3 in Pseudo distributed mode. When I submit a map/reduce job to process a file of size about 16 GB, in job.xml, I have the following mapred.map.tasks =242 mapred.min.split.size =0 dfs.block.size = 67108864 I would like to reduce mapred.map.tasks to see if it improves performance. I have tried doubling the size of dfs.block.size. But the mapred.map.tasks remains unchanged. Is there a way to reduce mapred.map.tasks ? Thanks in advance for any assistance ! Shing
-
Re: How to lower the total number of map tasks
Chris Nauroth 2012-10-02, 17:00
Those numbers make sense, considering 1 map task per block. 16 GB file / 64 MB block size = ~242 map tasks.
When you doubled dfs.block.size, how did you accomplish that? Typically, the block size is selected at file write time, with a default value from system configuration used if not specified. Did you "hadoop fs -put" the file with the new block size, or was it something else?
Thank you, --Chris
On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <[EMAIL PROTECTED]> wrote:
> > > > I am running Hadoop 1.0.3 in Pseudo distributed mode. > When I submit a map/reduce job to process a file of size about 16 GB, in > job.xml, I have the following > > > mapred.map.tasks =242 > mapred.min.split.size =0 > dfs.block.size = 67108864 > > > I would like to reduce mapred.map.tasks to see if it improves > performance. > I have tried doubling the size of dfs.block.size. But > the mapred.map.tasks remains unchanged. > Is there a way to reduce mapred.map.tasks ? > > > Thanks in advance for any assistance ! > Shing > >
-
Re: How to lower the total number of map tasks
Bejoy Ks 2012-10-02, 17:01
Hi
You need to alter the value of mapred.max.split size to a value larger than your block size to have less number of map tasks than the default.
On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <[EMAIL PROTECTED]> wrote:
> > > > I am running Hadoop 1.0.3 in Pseudo distributed mode. > When I submit a map/reduce job to process a file of size about 16 GB, in > job.xml, I have the following > > > mapred.map.tasks =242 > mapred.min.split.size =0 > dfs.block.size = 67108864 > > > I would like to reduce mapred.map.tasks to see if it improves > performance. > I have tried doubling the size of dfs.block.size. But > the mapred.map.tasks remains unchanged. > Is there a way to reduce mapred.map.tasks ? > > > Thanks in advance for any assistance ! > Shing > >
-
Re: How to lower the total number of map tasks
Bejoy Ks 2012-10-02, 17:03
Sorry for the typo, the property name is mapred.max.split.size
Also just for changing the number of map tasks you don't need to modify the hdfs block size.
On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <[EMAIL PROTECTED]> wrote:
> Hi > > You need to alter the value of mapred.max.split size to a value larger > than your block size to have less number of map tasks than the default. > > > On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <[EMAIL PROTECTED]> wrote: > >> >> >> >> I am running Hadoop 1.0.3 in Pseudo distributed mode. >> When I submit a map/reduce job to process a file of size about 16 GB, >> in job.xml, I have the following >> >> >> mapred.map.tasks =242 >> mapred.min.split.size =0 >> dfs.block.size = 67108864 >> >> >> I would like to reduce mapred.map.tasks to see if it improves >> performance. >> I have tried doubling the size of dfs.block.size. But >> the mapred.map.tasks remains unchanged. >> Is there a way to reduce mapred.map.tasks ? >> >> >> Thanks in advance for any assistance ! >> Shing >> >> >
-
Re: How to lower the total number of map tasks
Shing Hing Man 2012-10-02, 17:33
I set the block size using Configuration.setInt("dfs.block.size",134217728); I have also set it in mapred-site.xml.
Shing
________________________________ From: Chris Nauroth <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]; Shing Hing Man <[EMAIL PROTECTED]> Sent: Tuesday, October 2, 2012 6:00 PM Subject: Re: How to lower the total number of map tasks
Those numbers make sense, considering 1 map task per block. 16 GB file / 64 MB block size = ~242 map tasks.
When you doubled dfs.block.size, how did you accomplish that? Typically, the block size is selected at file write time, with a default value from system configuration used if not specified. Did you "hadoop fs -put" the file with the new block size, or was it something else?
Thank you, --Chris On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <[EMAIL PROTECTED]> wrote: > > >I am running Hadoop 1.0.3 in Pseudo distributed mode. >When I submit a map/reduce job to process a file of size about 16 GB, in job.xml, I have the following > > >mapred.map.tasks =242 >mapred.min.split.size =0 >dfs.block.size = 67108864 > > >I would like to reduce mapred.map.tasks to see if it improves performance. >I have tried doubling the size of dfs.block.size. But the mapred.map.tasks remains unchanged. >Is there a way to reduce mapred.map.tasks ? > > >Thanks in advance for any assistance ! >Shing > >
-
Re: How to lower the total number of map tasks
Bejoy KS 2012-10-02, 17:37
Shing
This doesn't change the block size of existing files in hdfs, only new files written to hdfs will be affected. To get this in effect for old files you need to re copy them atleast within hdfs. hadoop fs -cp src destn. Regards Bejoy KS
Sent from handheld, please excuse typos.
-----Original Message----- From: Shing Hing Man <[EMAIL PROTECTED]> Date: Tue, 2 Oct 2012 10:33:45 To: [EMAIL PROTECTED]<[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Subject: Re: How to lower the total number of map tasks
I set the block size using Configuration.setInt("dfs.block.size",134217728); I have also set it in mapred-site.xml.
Shing
________________________________ From: Chris Nauroth <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]; Shing Hing Man <[EMAIL PROTECTED]> Sent: Tuesday, October 2, 2012 6:00 PM Subject: Re: How to lower the total number of map tasks
Those numbers make sense, considering 1 map task per block. 16 GB file / 64 MB block size = ~242 map tasks.
When you doubled dfs.block.size, how did you accomplish that? Typically, the block size is selected at file write time, with a default value from system configuration used if not specified. Did you "hadoop fs -put" the file with the new block size, or was it something else?
Thank you, --Chris On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <[EMAIL PROTECTED]> wrote: > > >I am running Hadoop 1.0.3 in Pseudo distributed mode. >When I submit a map/reduce job to process a file of size about 16 GB, in job.xml, I have the following > > >mapred.map.tasks =242 >mapred.min.split.size =0 >dfs.block.size = 67108864 > > >I would like to reduce mapred.map.tasks to see if it improves performance. >I have tried doubling the size of dfs.block.size. But the mapred.map.tasks remains unchanged. >Is there a way to reduce mapred.map.tasks ? > > >Thanks in advance for any assistance ! >Shing > >
-
Re: How to lower the total number of map tasks
Shing Hing Man 2012-10-02, 17:38
I have tried Configuration.setInt("mapred.max.split.size",134217728);
and setting mapred.max.split.size in mapred-site.xml. ( dfs.block.size is left unchanged at 67108864).
But in the job.xml, I am still getting mapred.map.tasks =242 .
Shing ________________________________ From: Bejoy Ks <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]; Shing Hing Man <[EMAIL PROTECTED]> Sent: Tuesday, October 2, 2012 6:03 PM Subject: Re: How to lower the total number of map tasks
Sorry for the typo, the property name is mapred.max.split.size
Also just for changing the number of map tasks you don't need to modify the hdfs block size. On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <[EMAIL PROTECTED]> wrote:
Hi > > >You need to alter the value of mapred.max.split size to a value larger than your block size to have less number of map tasks than the default. > > > >On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <[EMAIL PROTECTED]> wrote: > > >> >> >>I am running Hadoop 1.0.3 in Pseudo distributed mode. >>When I submit a map/reduce job to process a file of size about 16 GB, in job.xml, I have the following >> >> >>mapred.map.tasks =242 >>mapred.min.split.size =0 >>dfs.block.size = 67108864 >> >> >>I would like to reduce mapred.map.tasks to see if it improves performance. >>I have tried doubling the size of dfs.block.size. But the mapred.map.tasks remains unchanged. >>Is there a way to reduce mapred.map.tasks ? >> >> >>Thanks in advance for any assistance ! >>Shing >> >> >
-
Re: How to lower the total number of map tasks
Bejoy KS 2012-10-02, 17:46
Hi Shing
Is your input a single file or set of small files? If latter you need to use CombineFileInputFormat. Regards Bejoy KS
Sent from handheld, please excuse typos.
-----Original Message----- From: Shing Hing Man <[EMAIL PROTECTED]> Date: Tue, 2 Oct 2012 10:38:59 To: [EMAIL PROTECTED]<[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Subject: Re: How to lower the total number of map tasks I have tried Configuration.setInt("mapred.max.split.size",134217728);
and setting mapred.max.split.size in mapred-site.xml. ( dfs.block.size is left unchanged at 67108864).
But in the job.xml, I am still getting mapred.map.tasks =242 .
Shing ________________________________ From: Bejoy Ks <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]; Shing Hing Man <[EMAIL PROTECTED]> Sent: Tuesday, October 2, 2012 6:03 PM Subject: Re: How to lower the total number of map tasks
Sorry for the typo, the property name is mapred.max.split.size
Also just for changing the number of map tasks you don't need to modify the hdfs block size. On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <[EMAIL PROTECTED]> wrote:
Hi > > >You need to alter the value of mapred.max.split size to a value larger than your block size to have less number of map tasks than the default. > > > >On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <[EMAIL PROTECTED]> wrote: > > >> >> >>I am running Hadoop 1.0.3 in Pseudo distributed mode. >>When I submit a map/reduce job to process a file of size about 16 GB, in job.xml, I have the following >> >> >>mapred.map.tasks =242 >>mapred.min.split.size =0 >>dfs.block.size = 67108864 >> >> >>I would like to reduce mapred.map.tasks to see if it improves performance. >>I have tried doubling the size of dfs.block.size. But the mapred.map.tasks remains unchanged. >>Is there a way to reduce mapred.map.tasks ? >> >> >>Thanks in advance for any assistance ! >>Shing >> >> >
-
Re: How to lower the total number of map tasks
Shing Hing Man 2012-10-02, 18:17
I have done the following.
1) stop-all.sh 2) In mapred-site.xml, added <property> <name>mapred.max.split.size</name> <value>134217728</value> </property>
(df.block.size remain unchanged at 67108864)
3) start-all.sh 4) Use hadoop fs -cp src destn, to copy my original file to another hdfs directory.
5) Run my mapReduce program using the new copy of input file .
However, in the job.xml, I still get mapred.map.tasks =242, which is same as before. I have also tried deleting my input file in hdfs and import it again from my local drive.
Any more ideas ?
Shing ________________________________ From: Bejoy KS <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]; Shing Hing Man <[EMAIL PROTECTED]> Sent: Tuesday, October 2, 2012 6:37 PM Subject: Re: How to lower the total number of map tasks
Shing
This doesn't change the block size of existing files in hdfs, only new files written to hdfs will be affected. To get this in effect for old files you need to re copy them atleast within hdfs. hadoop fs -cp src destn. Regards Bejoy KS
Sent from handheld, please excuse typos. ________________________________
From: Shing Hing Man <[EMAIL PROTECTED]> Date: Tue, 2 Oct 2012 10:33:45 -0700 (PDT) To: [EMAIL PROTECTED]<[EMAIL PROTECTED]> ReplyTo: [EMAIL PROTECTED] Subject: Re: How to lower the total number of map tasks
I set the block size using Configuration.setInt("dfs.block.size",134217728); I have also set it in mapred-site.xml.
Shing
________________________________ From: Chris Nauroth <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]; Shing Hing Man <[EMAIL PROTECTED]> Sent: Tuesday, October 2, 2012 6:00 PM Subject: Re: How to lower the total number of map tasks
Those numbers make sense, considering 1 map task per block. 16 GB file / 64 MB block size = ~242 map tasks.
When you doubled dfs.block.size, how did you accomplish that? Typically, the block size is selected at file write time, with a default value from system configuration used if not specified. Did you "hadoop fs -put" the file with the new block size, or was it something else?
Thank you, --Chris On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <[EMAIL PROTECTED]> wrote: > > >I am running Hadoop 1.0.3 in Pseudo distributed mode. >When I submit a map/reduce job to process a file of size about 16 GB, in job.xml, I have the following > > >mapred.map.tasks =242 >mapred.min.split.size =0 >dfs.block.size = 67108864 > > >I would like to reduce mapred.map.tasks to see if it improves performance. >I have tried doubling the size of dfs.block.size. But the mapred.map.tasks remains unchanged. >Is there a way to reduce mapred.map.tasks ? > > >Thanks in advance for any assistance ! >Shing > >
|
|