Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Pig UDF profiling


Copy link to this message
-
Pig UDF profiling
Hello everyone,

I'm new to Pig developing and currently I need to profile some UDF functions I made.

I would really like to know if this is possible and how I can do it.

I'm aware of the pig.udf.profile attribute, but it's not enough. I need a complete profiler report on where do I spend my CPU time and memory.

I also tried to use HPROF, but I'm getting some weird results. I'm running the profiler like this:

pig -Dmapred.task.profile.maps=0-1 -Dmapred.tasks.profile.reduces=0-1 -Dmapred.task.profile=true -Dmapred.task.profile.params=-agentlib:hprof=cpu=samples,depth=6,orce=n,thread=y,verbose=n,file=/tmp/my_profile -f my_pig_script.pig

The command above is running on the cluster and then I need to ssh to every node to search for the result. I read that file=%s is returning the file somehow but it's not working in my case.

I also tried to run the same command with -x local flag. But I got no result from the profiler.

One last thing I want to mention is that I'm writing my code in NetBeans and I upload my program.jar on the cluster to submit it from there. Is there a way that I can run pig locally though NetBeans and profile my UDFs from there?

Thank you in advance,
Anastasis Andronidis
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB