Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - need compress for ObjectSerializer

Copy link to this message
need compress for ObjectSerializer
Haitao Yao 2012-11-06, 04:06
hi, all
I think we need to optimize the org.apache.pig.impl.util.ObjectSerializer, because it uses java object serialization, which wastes a lot of space, so that it causes the tasktracker to OOME. here's the analyze result of tasktracker heap dump:

This illustrates that the heap is retained by the JobConf objects, and we known jobconf contains a lot of Key-value strings.

So here's the statistics of heap retention:

And dive into the object histogram, here it is:

And here's the source code:

So I think we need to compress the output of object serializer. I'm submitting my patch.

Haitao Yao
weibo: @haitao_yao
Skype:  haitao.yao.final