Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: Number of Maps running more than expected


Copy link to this message
-
Re: Number of Maps running more than expected
Hi Gaurav,
   Number map is not depents upon number block . It is really depends upon
number of input splits . If you had 100GB of data and you had 10 split
means then you can see only 10 maps .

Please correct me if i am wrong

Thanks and regards,
Syed abdul kather
On Aug 16, 2012 7:44 PM, "Gaurav Dasgupta [via Lucene]" <
ml-node+[EMAIL PROTECTED]> wrote:

> Hi users,
>
> I am working on a CDH3 cluster of 12 nodes (Task Trackers running on all
> the 12 nodes and 1 node running the Job Tracker).
> In order to perform a WordCount benchmark test, I did the following:
>
>    - Executed "RandomTextWriter" first to create 100 GB data (Note that I
>    have changed the "test.randomtextwrite.total_bytes" parameter only, rest
>    all are kept default).
>    - Next, executed the "WordCount" program for that 100 GB dataset.
>
> The "Block Size" in "hdfs-site.xml" is set as 128 MB. Now, according to my
> calculation, total number of Maps to be executed by the wordcount job
> should be 100 GB / 128 MB or 102400 MB / 128 MB = 800.
> But when I am executing the job, it is running a total number of 900 Maps,
> i.e., 100 extra.
> So, why this extra number of Maps? Although, my job is completing
> successfully without any error.
>
> Again, if I don't execute the "RandomTextWwriter" job to create data for
> my wordcount, rather I put my own 100 GB text file in HDFS and run
> "WordCount", I can then see the number of Maps are equivalent to my
> calculation, i.e., 800.
>
> Can anyone tell me why this odd behaviour of Hadoop regarding the number
> of Maps for WordCount only when the dataset is generated by
> RandomTextWriter? And what is the purpose of these extra number of Maps?
>
> Regards,
> Gaurav Dasgupta
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Number-of-Maps-running-more-than-expected-tp4001631.html
>  To unsubscribe from Lucene, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472066&code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw>
> .
> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
-----
THANKS AND REGARDS,
SYED ABDUL KATHER
--
View this message in context: http://lucene.472066.n3.nabble.com/Number-of-Maps-running-more-than-expected-tp4001631p4001683.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.