I think we need to optimize the org.apache.pig.impl.util.ObjectSerializer, because it uses java object serialization, which wastes a lot of space, so that it causes the tasktracker to OOME. here's the analyze result of tasktracker heap dump:
This illustrates that the heap is retained by the JobConf objects, and we known jobconf contains a lot of Key-value strings.
So here's the statistics of heap retention:
And dive into the object histogram, here it is:
And here's the source code:
So I think we need to compress the output of object serializer. I'm submitting my patch.