Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: Number of Maps running more than expected


Copy link to this message
-
Re: Number of Maps running more than expected
It would be helpful to see some statistics out of both the jobs like bytes
read, written number of errors etc.

On Thu, Aug 16, 2012 at 8:02 PM, Raj Vishwanathan <[EMAIL PROTECTED]> wrote:

> You probably have speculative execution on. Extra maps and reduce tasks
> are run in case some of them fail
>
> Raj
>
>
> Sent from my iPad
> Please excuse the typos.
>
> On Aug 16, 2012, at 11:36 AM, "in.abdul" <[EMAIL PROTECTED]> wrote:
>
> > Hi Gaurav,
> >   Number map is not depents upon number block . It is really depends upon
> > number of input splits . If you had 100GB of data and you had 10 split
> > means then you can see only 10 maps .
> >
> > Please correct me if i am wrong
> >
> > Thanks and regards,
> > Syed abdul kather
> > On Aug 16, 2012 7:44 PM, "Gaurav Dasgupta [via Lucene]" <
> > ml-node+[EMAIL PROTECTED]> wrote:
> >
> >> Hi users,
> >>
> >> I am working on a CDH3 cluster of 12 nodes (Task Trackers running on all
> >> the 12 nodes and 1 node running the Job Tracker).
> >> In order to perform a WordCount benchmark test, I did the following:
> >>
> >>   - Executed "RandomTextWriter" first to create 100 GB data (Note that I
> >>   have changed the "test.randomtextwrite.total_bytes" parameter only,
> rest
> >>   all are kept default).
> >>   - Next, executed the "WordCount" program for that 100 GB dataset.
> >>
> >> The "Block Size" in "hdfs-site.xml" is set as 128 MB. Now, according to
> my
> >> calculation, total number of Maps to be executed by the wordcount job
> >> should be 100 GB / 128 MB or 102400 MB / 128 MB = 800.
> >> But when I am executing the job, it is running a total number of 900
> Maps,
> >> i.e., 100 extra.
> >> So, why this extra number of Maps? Although, my job is completing
> >> successfully without any error.
> >>
> >> Again, if I don't execute the "RandomTextWwriter" job to create data for
> >> my wordcount, rather I put my own 100 GB text file in HDFS and run
> >> "WordCount", I can then see the number of Maps are equivalent to my
> >> calculation, i.e., 800.
> >>
> >> Can anyone tell me why this odd behaviour of Hadoop regarding the number
> >> of Maps for WordCount only when the dataset is generated by
> >> RandomTextWriter? And what is the purpose of these extra number of Maps?
> >>
> >> Regards,
> >> Gaurav Dasgupta
> >>
> >>
> >> ------------------------------
> >> If you reply to this email, your message will be added to the discussion
> >> below:
> >>
> >>
> http://lucene.472066.n3.nabble.com/Number-of-Maps-running-more-than-expected-tp4001631.html
> >> To unsubscribe from Lucene, click here<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472066&code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw
> >
> >> .
> >> NAML<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
> >
> >>
> >
> >
> >
> >
> > -----
> > THANKS AND REGARDS,
> > SYED ABDUL KATHER
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Number-of-Maps-running-more-than-expected-tp4001631p4001683.html
> > Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB