in.abdul 2012-08-16, 18:36
Bertrand Dechoux 2012-08-16, 21:01
Bertrand Dechoux 2012-08-16, 21:35
You probably have speculative execution on. Extra maps and reduce tasks are run in case some of them fail
Sent from my iPad
Please excuse the typos.
On Aug 16, 2012, at 11:36 AM, "in.abdul" <[EMAIL PROTECTED]> wrote:
> Hi Gaurav,
> Number map is not depents upon number block . It is really depends upon
> number of input splits . If you had 100GB of data and you had 10 split
> means then you can see only 10 maps .
> Please correct me if i am wrong
> Thanks and regards,
> Syed abdul kather
> On Aug 16, 2012 7:44 PM, "Gaurav Dasgupta [via Lucene]" <
> ml-node+[EMAIL PROTECTED]> wrote:
>> Hi users,
>> I am working on a CDH3 cluster of 12 nodes (Task Trackers running on all
>> the 12 nodes and 1 node running the Job Tracker).
>> In order to perform a WordCount benchmark test, I did the following:
>> - Executed "RandomTextWriter" first to create 100 GB data (Note that I
>> have changed the "test.randomtextwrite.total_bytes" parameter only, rest
>> all are kept default).
>> - Next, executed the "WordCount" program for that 100 GB dataset.
>> The "Block Size" in "hdfs-site.xml" is set as 128 MB. Now, according to my
>> calculation, total number of Maps to be executed by the wordcount job
>> should be 100 GB / 128 MB or 102400 MB / 128 MB = 800.
>> But when I am executing the job, it is running a total number of 900 Maps,
>> i.e., 100 extra.
>> So, why this extra number of Maps? Although, my job is completing
>> successfully without any error.
>> Again, if I don't execute the "RandomTextWwriter" job to create data for
>> my wordcount, rather I put my own 100 GB text file in HDFS and run
>> "WordCount", I can then see the number of Maps are equivalent to my
>> calculation, i.e., 800.
>> Can anyone tell me why this odd behaviour of Hadoop regarding the number
>> of Maps for WordCount only when the dataset is generated by
>> RandomTextWriter? And what is the purpose of these extra number of Maps?
>> Gaurav Dasgupta
>> If you reply to this email, your message will be added to the discussion
>> To unsubscribe from Lucene, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472066&code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw>
> THANKS AND REGARDS,
> SYED ABDUL KATHER
> View this message in context: http://lucene.472066.n3.nabble.com/Number-of-Maps-running-more-than-expected-tp4001631p4001683.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Mohit Anchlia 2012-08-17, 03:39