I am comparing runtime of similar logic. The entire logic is exactly same
but surprisingly map reduce job that I submit is 100x slow. For pig I use
udf and for hadoop I use mapper only and the logic same as pig. Even the
splits on the admin page are same. Not sure why it's so slow. I am
submitting job like:
How should I go about looking the root cause of why it's so slow? Any
suggestions would be really appreciated.
One of the things I noticed is that on the admin page of map task list I
see status as "hdfs://dsdb1:54310/examples/testfile40.seq:0+134217728" but
for pig the status is blank.