I'm new to Pig developing and currently I need to profile some UDF functions I made.
I would really like to know if this is possible and how I can do it.
I'm aware of the pig.udf.profile attribute, but it's not enough. I need a complete profiler report on where do I spend my CPU time and memory.
I also tried to use HPROF, but I'm getting some weird results. I'm running the profiler like this:
pig -Dmapred.task.profile.maps=0-1 -Dmapred.tasks.profile.reduces=0-1 -Dmapred.task.profile=true -Dmapred.task.profile.params=-agentlib:hprof=cpu=samples,depth=6,orce=n,thread=y,verbose=n,file=/tmp/my_profile -f my_pig_script.pig
The command above is running on the cluster and then I need to ssh to every node to search for the result. I read that file=%s is returning the file somehow but it's not working in my case.
I also tried to run the same command with -x local flag. But I got no result from the profiler.
One last thing I want to mention is that I'm writing my code in NetBeans and I upload my program.jar on the cluster to submit it from there. Is there a way that I can run pig locally though NetBeans and profile my UDFs from there?
Thank you in advance,