Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: Number of Maps running more than expected


Copy link to this message
-
Re: Number of Maps running more than expected
Also could you tell us more about your task statuses?
You might also have failed tasks...
Bertrand

On Thu, Aug 16, 2012 at 11:01 PM, Bertrand Dechoux <[EMAIL PROTECTED]>wrote:

> Well, there is speculative executions too.
>
> http://developer.yahoo.com/hadoop/tutorial/module4.html
>
> *Speculative execution:* One problem with the Hadoop system is that by
>> dividing the tasks across many nodes, it is possible for a few slow nodes
>> to rate-limit the rest of the program. For example if one node has a slow
>> disk controller, then it may be reading its input at only 10% the speed of
>> all the other nodes. So when 99 map tasks are already complete, the system
>> is still waiting for the final map task to check in, which takes much
>> longer than all the other nodes.
>> By forcing tasks to run in isolation from one another, individual tasks
>> do not know *where* their inputs come from. Tasks trust the Hadoop
>> platform to just deliver the appropriate input. Therefore, the same input
>> can be processed *multiple times in parallel*, to exploit differences in
>> machine capabilities. As most of the tasks in a job are coming to a close,
>> the Hadoop platform will schedule redundant copies of the remaining tasks
>> across several nodes which do not have other work to perform. This process
>> is known as *speculative execution*. When tasks complete, they announce
>> this fact to the JobTracker. Whichever copy of a task finishes first
>> becomes the definitive copy. If other copies were executing speculatively,
>> Hadoop tells the TaskTrackers to abandon the tasks and discard their
>> outputs. The Reducers then receive their inputs from whichever Mapper
>> completed successfully, first.
>> Speculative execution is enabled by default. You can disable speculative
>> execution for the mappers and reducers by setting the
>> mapred.map.tasks.speculative.execution and
>> mapred.reduce.tasks.speculative.execution JobConf options to false,
>> respectively.
>
>
>
> Can you tell us your configuration with regards to those parameters?
>
> Regards
>
> Bertrand
>
> On Thu, Aug 16, 2012 at 8:36 PM, in.abdul <[EMAIL PROTECTED]> wrote:
>
>> Hi Gaurav,
>>    Number map is not depents upon number block . It is really depends upon
>> number of input splits . If you had 100GB of data and you had 10 split
>> means then you can see only 10 maps .
>>
>> Please correct me if i am wrong
>>
>> Thanks and regards,
>> Syed abdul kather
>> On Aug 16, 2012 7:44 PM, "Gaurav Dasgupta [via Lucene]" <
>> ml-node+[EMAIL PROTECTED]> wrote:
>>
>> > Hi users,
>> >
>> > I am working on a CDH3 cluster of 12 nodes (Task Trackers running on all
>> > the 12 nodes and 1 node running the Job Tracker).
>> > In order to perform a WordCount benchmark test, I did the following:
>> >
>> >    - Executed "RandomTextWriter" first to create 100 GB data (Note that
>> I
>> >    have changed the "test.randomtextwrite.total_bytes" parameter only,
>> rest
>> >    all are kept default).
>> >    - Next, executed the "WordCount" program for that 100 GB dataset.
>> >
>> > The "Block Size" in "hdfs-site.xml" is set as 128 MB. Now, according to
>> my
>> > calculation, total number of Maps to be executed by the wordcount job
>> > should be 100 GB / 128 MB or 102400 MB / 128 MB = 800.
>> > But when I am executing the job, it is running a total number of 900
>> Maps,
>> > i.e., 100 extra.
>> > So, why this extra number of Maps? Although, my job is completing
>> > successfully without any error.
>> >
>> > Again, if I don't execute the "RandomTextWwriter" job to create data for
>> > my wordcount, rather I put my own 100 GB text file in HDFS and run
>> > "WordCount", I can then see the number of Maps are equivalent to my
>> > calculation, i.e., 800.
>> >
>> > Can anyone tell me why this odd behaviour of Hadoop regarding the number
>> > of Maps for WordCount only when the dataset is generated by
>> > RandomTextWriter? And what is the purpose of these extra number of Maps?
>> >
>> > Regards,

Bertrand Dechoux
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB