Just in case anyone else ever runs into this.
Lately our cluster kept on killing itself with an OOM message in the kernel log. It took me a while to realize why this happened since no single process was causing it.
I traced it back to a few queries running concurrently on a really small datasets. This caused all of these queries to run localmode. Then I realized there isn't a limit to how many queries can run in localmode and since they use the normal hadoop memory settings it's pretty easy to hit OOM on a machine this way.
I'm not sure about the long term solution (some kind of limit on the number of localmode processes), but for now I'll probably disable localmode on these queries.
Edward Capriolo 2013-02-05, 16:39