The easy way to debug such problems in our experience is to use 'jmap' to
take a few snapshots of one of the tasktrackers (child tasks) and analyze
them under a profiler tool such as jprofiler, yourkit etc. This should give
you pretty good indication of objects that are using up most heap memory.
You can add JVM options to suspend child tasks upon startup and attach using
debugger etc. but that is more painful in a distributed environment.
On Mon, Oct 17, 2011 at 11:34 AM, W.P. McNeill <[EMAIL PROTECTED]> wrote:
> I'm investigating a bug where my mapper and reducer tasks run out of
> It only reproduces when I run on large data sets, so the best way to dig in
> is to launch my job with sufficiently large inputs on the cluster and
> monitor the memory characteristics of the failing JVMs remotely. Java
> VM looks like the tool I want to use. Specifically I want to use it to do
> heap dumps on my tasks. I can't figure out how to set up the listening end
> on the cluster nodes, however.
> Here is what I have tried:
> 1. *Turn on JMX remote for the tasks*...I added the following options to
> = false.
> This does not work because there is contention for the JMX remote port when
> multiple tasks run on the same node. All but the first task fail at JVM
> initialization time, causing the job to fail before I can see the repro.
> 2. *Use jstatd*...I tried running jstatd in the background on my cluster
> nodes. It launches and runs, but when I try to connect using Visual VM,
> nothing happens.
> I am going to try adding -XX:-HeapDumpOnOutOfMemoryError, which will at
> least give me post-mortem information. Does anyone know where the heap dump
> file will be written?
> Has anyone debugged a similar setup? What tools did you use?