I agree that resuming the process is best handled by site-local tooling.
Could be we do a better job of informing that tooling regarding the nature
of the failure. Well defined exit codes, for instance, may be useful.

On Thursday, March 20, 2014, Du, Jingcheng <[EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB