Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: 2 Map tasks running for a small input file


Copy link to this message
-
Re: 2 Map tasks running for a small input file
Hi Sai,

What Viji indicated is that the default Apache Hadoop setting for any
input is 2 maps. If the input is larger than one block, regular
policies of splitting such as those stated by Shekhar would apply. But
for smaller inputs, just for an out-of-box "parallelism experience",
Hadoop ships with a 2-maps forced splitting default
(mapred.map.tasks=2).

This means your 5 lines is probably divided as 2:3 or other ratios and
is processed by 2 different Tasks. As Viji also indicated, to turn off
this behavior, you can set the mapred.map.tasks to 1 in your configs
and then you'll see only one map task process all 5 lines.

On Thu, Sep 26, 2013 at 4:59 PM, Sai Sai <[EMAIL PROTECTED]> wrote:
> Thanks Viji.
> I am confused a little when the data is small y would there b 2 tasks.
> U will use the min as 2 if u need it but in this case it is not needed due
> to size of the data being small
> so y would 2 map tasks exec.
> Since it results in 1 block with 5 lines of data in it
> i am assuming this results in 5 map computations 1 per each line
> and all of em in 1 process/node since i m using a pseudo vm.
> Where is the second task coming from.
> The 5 computations of map on each line is 1 task.
> Is this right.
> Please help.
> Thanks
>
>
> ________________________________
> From: Viji R <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; Sai Sai <[EMAIL PROTECTED]>
> Sent: Thursday, 26 September 2013 5:09 PM
> Subject: Re: 2 Map tasks running for a small input file
>
> Hi,
>
> Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
> avoid this.
>
> Regards,
> Viji
>
> On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <[EMAIL PROTECTED]> wrote:
>> Hi
>> Here is the input file for the wordcount job:
>> ******************
>> Hi This is a simple test.
>> Hi Hadoop how r u.
>> Hello Hello.
>> Hi Hi.
>> Hadoop Hadoop Welcome.
>> ******************
>>
>> After running the wordcount successfully
>> here r the counters info:
>>
>> ***************
>> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
>> Launched reduce tasks 0 0 1
>> Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
>> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
>> Launched map tasks 0 0 2
>> Data-local map tasks 0 0 2
>> SLOTS_MILLIS_REDUCES 0 0 9,199
>> ***************
>> My question why r there 2 launched map tasks when i have only a small
>> file.
>> Per my understanding it is only 1 block.
>> and should be only 1 split.
>> Then for each line a map computation should occur
>> but it shows 2 map tasks.
>> Please let me know.
>> Thanks
>> Sai
>>
>
>

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB