Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: 2 Map tasks running for a small input file


+
Sai Sai 2013-09-26, 11:29
+
Harsh J 2013-09-26, 11:54
+
Sai Sai 2013-09-26, 10:58
Copy link to this message
-
Re: 2 Map tasks running for a small input file
Dmapred.tasktracker.map.tasks.maximum=1 ...Guys this property is set
for task tracker...when you set this property it means, that
particular task tracker will not run more than 1 mapper task
parallely..

FOr example: if a map reduce job requires 5 mapper tasks and if you
set this property to 1, then only 1 mapper task will run and other
will wait..once the task is completed other tasks will be scheduled...
Could you please send the code, you are trying to run..the driver code
and mapred-site.xml contents..?

You can controll the numbr of map task through input split size(
mapred.min.split.size, mapred.max.split.size and dfs.block.size)

max(minSPlitSize, min(maxSPlitsize, blocksize))

Regards,
Som Shekhar Sharma
+91-8197243810
On Thu, Sep 26, 2013 at 6:07 PM, shashwat shriparv
<[EMAIL PROTECTED]> wrote:
> just try giving -Dmapred.tasktracker.map.tasks.maximum=1 on the command line
> and check how many map task its running. and also set this in
> mapred-site.xml and check.
>
> Thanks & Regards
>
> ∞
>
> Shashwat Shriparv
>
>
>
> On Thu, Sep 26, 2013 at 5:24 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>> Hi Sai,
>>
>> What Viji indicated is that the default Apache Hadoop setting for any
>> input is 2 maps. If the input is larger than one block, regular
>> policies of splitting such as those stated by Shekhar would apply. But
>> for smaller inputs, just for an out-of-box "parallelism experience",
>> Hadoop ships with a 2-maps forced splitting default
>> (mapred.map.tasks=2).
>>
>> This means your 5 lines is probably divided as 2:3 or other ratios and
>> is processed by 2 different Tasks. As Viji also indicated, to turn off
>> this behavior, you can set the mapred.map.tasks to 1 in your configs
>> and then you'll see only one map task process all 5 lines.
>>
>> On Thu, Sep 26, 2013 at 4:59 PM, Sai Sai <[EMAIL PROTECTED]> wrote:
>> > Thanks Viji.
>> > I am confused a little when the data is small y would there b 2 tasks.
>> > U will use the min as 2 if u need it but in this case it is not needed
>> > due
>> > to size of the data being small
>> > so y would 2 map tasks exec.
>> > Since it results in 1 block with 5 lines of data in it
>> > i am assuming this results in 5 map computations 1 per each line
>> > and all of em in 1 process/node since i m using a pseudo vm.
>> > Where is the second task coming from.
>> > The 5 computations of map on each line is 1 task.
>> > Is this right.
>> > Please help.
>> > Thanks
>> >
>> >
>> > ________________________________
>> > From: Viji R <[EMAIL PROTECTED]>
>> > To: [EMAIL PROTECTED]; Sai Sai <[EMAIL PROTECTED]>
>> > Sent: Thursday, 26 September 2013 5:09 PM
>> > Subject: Re: 2 Map tasks running for a small input file
>> >
>> > Hi,
>> >
>> > Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
>> > avoid this.
>> >
>> > Regards,
>> > Viji
>> >
>> > On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <[EMAIL PROTECTED]> wrote:
>> >> Hi
>> >> Here is the input file for the wordcount job:
>> >> ******************
>> >> Hi This is a simple test.
>> >> Hi Hadoop how r u.
>> >> Hello Hello.
>> >> Hi Hi.
>> >> Hadoop Hadoop Welcome.
>> >> ******************
>> >>
>> >> After running the wordcount successfully
>> >> here r the counters info:
>> >>
>> >> ***************
>> >> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
>> >> Launched reduce tasks 0 0 1
>> >> Total time spent by all reduces waiting after reserving slots (ms) 0 0
>> >> 0
>> >> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
>> >> Launched map tasks 0 0 2
>> >> Data-local map tasks 0 0 2
>> >> SLOTS_MILLIS_REDUCES 0 0 9,199
>> >> ***************
>> >> My question why r there 2 launched map tasks when i have only a small
>> >> file.
>> >> Per my understanding it is only 1 block.
>> >> and should be only 1 split.
>> >> Then for each line a map computation should occur
>> >> but it shows 2 map tasks.
>> >> Please let me know.
>> >> Thanks
>> >> Sai
>> >>
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB