Okay harsh : Your hint was enought to get me back on trakc! I  found the
linux container logs and they are Wonderful :)... I guess at the end of
each container run, logs get propogated into the Distributed file system's
/var/log  directories.

In any case, once i dug in there, I found the cryptic failure was because
my done_intermediate permissions were bad.

anyways, thanks for the hint Harsh ! After monitoring the local
/var/log/hadoop-yarn/container/ directory, i was able to see that the
stdout/stderr files were being deleted , and then after some googling i
found a post about how YARN aggregates logs into the DFS.

Anyways, problem solved.  For those curious:  If debugging
Yarn-linux-containers that are dying (as shown in [local]
/var/log/hadoop-yarn/ nodemanager logs), you can dig more after the task
dies by going into

hadoop fs -cat

On Fri, Feb 14, 2014 at 9:17 AM, German Florez-Larrahondo <
Jay Vyas

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB