Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # dev >> Queries on MRv2


Copy link to this message
-
Re: Queries on MRv2
Hi,

- How to specify that an ApplicationMaster use a particular version of the
MapReduce library dynamically?

- How does the ApplicationManager pick a node to run the ApplicationMaster?
What resource considerations are taken if any while picking a particular
node to run the ApplicationMaster?

- Who observes the ResourceManager/ApplicationMaster/NodeManager for
failures to be restarted later? From the blog entry it seems that the state
of the ResourceManager is stored in the ZooKeeper and the state of the
ApplicationManager is stored in the HDFS.

- Looks like the containers are based on Linux cgroups. So, is the MRv2
limited only to the Linux boxes?

Hope the design document from Arun will make me ask less queries in this
forum :)

Thanks,
Praveen

On Wed, Jun 15, 2011 at 9:17 AM, Mahadev Konar <[EMAIL PROTECTED]> wrote:

> Praveen,
>  In that case, if a just launched container is released, the NM will
> be notified via the RM that the container is not longer valid and the
> NM will go ahead and kill the container.
>
>
> On Tue, Jun 14, 2011 at 8:38 PM, Praveen Sripati
> <[EMAIL PROTECTED]> wrote:
> > Mahadev,
> >
> > MapReduce ApplicationMaster might behave well, but what about custom
> > ApplicationMasters for other models.
> >
> >> Q) What happens if an ApplicationMaster asks a NM to launch a container
> > and
> >> then releases the container in the allocate call later?
> >
> >> A) The Application Master only releases the container once the container
> > is done.
> >
> > Thanks,
> > Praveen
> >
> > On Wed, Jun 15, 2011 at 8:59 AM, Mahadev Konar <[EMAIL PROTECTED]>
> wrote:
> >
> >> Praveen,
> >>  Answers in line:
> >>
> >> >
> >> > Q) What happens if an ApplicationMaster asks a NM to launch a
> container
> >> and
> >> > then releases the container in the allocate call later?
> >>
> >> The Application Master only releases the container once the container is
> >> done.
> >>
> >> >
> >> > Q) So, the NM watches the UNIX Process/Containers and sends the status
> to
> >> > the ApplicationManager. Later the ApplicationManager sends the status
> of
> >> the
> >> > containers in response to the allocate call to the ApplicationMaster.
> Why
> >> > should the ApplicationMaster be aware of the container status, since
> it's
> >> > already tracking the map/reduce tasks in the containers?
> >>
> >> Its just a way to notify the application master as soon as possible
> >> when the containers fail.
> >> This helps in speeding up the notification of failed containers else
> >> AM has to wait for discovering
> >> failures via timeouts.
> >>
> >> >
> >> > Q) Does the ApplicationMaster notify the NodeManager to exit the UNIX
> >> > Process when the map/reduce tasks in that particular container are
> >> > completed? Are the containers re-used?
> >>
> >> Yes it notifes the NM.
> >>
> >> Containers are not re used as of now. In future we do see the
> >> containers being re used but we'll need leases to do that.
> >>
> >> >
> >> > Q) The ApplicationManager asks the NodeManager to create a container
> and
> >> > also launch the map/reduce task in it. From then on the
> >> ApplicationManager
> >> > and Map/Reduce tasks interact directly without the NodeManager. Am I
> >> > correct?
> >> >
> >> I think you mean ApplicationMaster. Yes, the applicationmaster and
> >> map/reduce tasks talk directly
> >> without NM being involved.
> >>
> >> > Praveen
> >> >
> >> > On Wed, Jun 15, 2011 at 12:59 AM, Arun C Murthy <[EMAIL PROTECTED]>
> >> wrote:
> >> >
> >> >>
> >> >> On Jun 14, 2011, at 6:31 PM, Praveen Sripati wrote:
> >> >>
> >> >>  Hi,
> >> >>>
> >> >>> I have gone through MapReduce NextGen Blog entries and JIRA and have
> >> the
> >> >>> following queries
> >> >>>
> >> >>>  There is a single API between the Scheduler and the
> ApplicationMaster:
> >> >>>>>
> >> >>>>
> >> >>>  (List <Container> newContainers, List <ContainerStatus>
> >> >>>>>
> >> >>>> containerStatuses) allocate (List <ResourceRequest> ask,
> >> List<Container>
> >> >>> release)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB