Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> 100x slower mapreduce compared to pig


Copy link to this message
-
100x slower mapreduce compared to pig
I am comparing runtime of similar logic. The entire logic is exactly same
but surprisingly map reduce job that I submit is 100x slow. For pig I use
udf and for hadoop I use mapper only and the logic same as pig. Even the
splits on the admin page are same. Not sure why it's so slow. I am
submitting job like:

java -classpath
.:analytics.jar:/hadoop-0.20.2-cdh3u3/lib/*:/root/.mohit/hadoop-0.20.2-cdh3u3/*:common.jar
com.services.dp.analytics.hadoop.mapred.FormMLProcessor
/examples/testfile40.seq,/examples/testfile41.seq,/examples/testfile42.seq,/examples/testfile43.seq,/examples/testfile44.seq,/examples/testfile45.seq,/examples/testfile46.seq,/examples/testfile47.seq,/examples/testfile48.seq,/examples/testfile49.seq
/examples/output1/

How should I go about looking the root cause of why it's so slow? Any
suggestions would be really appreciated.

One of the things I noticed is that on the admin page of map task list I
see status as "hdfs://dsdb1:54310/examples/testfile40.seq:0+134217728" but
for pig the status is blank.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB