Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Pig UDF profiling

Copy link to this message
Pig UDF profiling
andronat_asf 2013-07-03, 19:53
Hello everyone,

I'm new to Pig developing and currently I need to profile some UDF functions I made.

I would really like to know if this is possible and how I can do it.

I'm aware of the pig.udf.profile attribute, but it's not enough. I need a complete profiler report on where do I spend my CPU time and memory.

I also tried to use HPROF, but I'm getting some weird results. I'm running the profiler like this:

pig -Dmapred.task.profile.maps=0-1 -Dmapred.tasks.profile.reduces=0-1 -Dmapred.task.profile=true -Dmapred.task.profile.params=-agentlib:hprof=cpu=samples,depth=6,orce=n,thread=y,verbose=n,file=/tmp/my_profile -f my_pig_script.pig

The command above is running on the cluster and then I need to ssh to every node to search for the result. I read that file=%s is returning the file somehow but it's not working in my case.

I also tried to run the same command with -x local flag. But I got no result from the profiler.

One last thing I want to mention is that I'm writing my code in NetBeans and I upload my program.jar on the cluster to submit it from there. Is there a way that I can run pig locally though NetBeans and profile my UDFs from there?

Thank you in advance,
Anastasis Andronidis