Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Job end notification does not always work (Hadoop 2.x)


Copy link to this message
-
Job end notification does not always work (Hadoop 2.x)
Hello,

I came across an issue that occurs with the job notification callbacks in
MR2. It works fine if the Application master has started, but does not send
a callback if the initializing of AM fails.

Here is the code from MRAppMaster.java

.....
.......

      // set job classloader if configured
      MRApps.setJobClassLoader(conf);
      initAndStartAppMaster(appMaster, conf, jobUserName);
    } catch (Throwable t) {
      LOG.fatal("Error starting MRAppMaster", t);
      System.exit(1);
    }
  }

protected static void initAndStartAppMaster(final MRAppMaster appMaster,
      final YarnConfiguration conf, String jobUserName) throws IOException,
      InterruptedException {
    UserGroupInformation.setConfiguration(conf);
    UserGroupInformation appMasterUgi = UserGroupInformation
        .createRemoteUser(jobUserName);
    appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
      @Override
      public Object run() throws Exception {
        appMaster.init(conf);
        appMaster.start();
        if(appMaster.errorHappenedShutDown) {
          throw new IOException("Was asked to shut down.");
        }
        return null;
      }
    });
  }

appMaster.init(conf) does not dispatch JobFinishEventHandler which is
responsible for sending a HTTP callback (via shutDownJob()). If there was
an exception at this time, the process would simply terminate (via
System.exit(1) )

appMaster.start() however rightly uses the JobFinishEventHandler and things
work fine.

Shouldn't a failure on init(..) also send a callback suggesting the job
failed?

Thanks,
Prashant
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB