Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Problems with MR Job running really slowly


Copy link to this message
-
Problems with MR Job running really slowly
I have a job which takes an xml file - the splitter breaks the file into
tags, the mapper parses each tag and sends the data to the
reducer. I am using a custom splitter which reads the file looking for
start and end tags.

When I run the code in the splitter and the mapper - generating separate
tags and parsing them
I can read a file sized at about  500MB containing 12000 tags on my local
system in 23 seconds

When I read a file on HDFS on a local cluster I can read and parse the file
in 38 seconds

When I run the same code on a eight node cluster I get 7 map tasks. The
mappers are taking 190 seconds to handle 100 tags of
which 200 millisec is parsing and almost all of the rest of the time is
in context.write. A mapper handling 1600 tags takes about 3 hours -
These are the statistics for a map task - it it true that one tag well be
sent to about 300 keys but still 3 hours to write 1,5 million records and
5Gb seems
way excessive

*FileSystemCounters*
FILE_BYTES_READ 816,935,457
HDFS_BYTES_READ 439,554,860
FILE_BYTES_WRITTEN 1,667,745,197
*Performance*
TotalScoredScans 1,660
*Map-Reduce Framework*
Combine output records0Map input records 6,134
Spilled Records 1,690,063
Map output bytes 5,517,423,780
Combine input records 0
Map output records 571,475

Anyone want to offer suggestions on how to tune the job better

--
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB