Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Profiling map reduce jobs?


Copy link to this message
-
RE: Profiling map reduce jobs?
I just advice to use MultipleOutputFormat, instead of MultipleOurput.write

--Send from my Sony mobile.
On Jun 29, 2013 9:16 PM, "David Poisson" <[EMAIL PROTECTED]>
wrote:

> Just thought I'd provide some insight into our problem.
>
> It appears that the problem was a slowdown caused by the use of
> multipleOutputs.write(output, key, keyValue, path) (going from memory
> here). Anyways, after looking at the implementation of that write function
>  in multipleOutputs.java it appears that a context was created and a conf
> was gotten and a new recordWriter was gotten for every call to
> write(output, key, keyValue, path).
>
> We have changed all of those calls to write(output, key, keyValue) (which
> doesn't do any extra things) and it seems to help.
>
> Anyone else has any tips when using multipleOutputs?
>
> We are taking our input and splitting it into 3 files. So it seems to be a
> natural choice for MultipleOutputs. Performance is a bit slow though.
>
> Cheers!
>
> David
> ________________________________________
> From: David Poisson [[EMAIL PROTECTED]]
> Sent: Thursday, June 27, 2013 4:22 PM
> To: [EMAIL PROTECTED]
> Subject: Profiling map reduce jobs?
>
> Howdy,
>      I want to take a look at a MR job which seems to be slower than I had
> hoped. Mind you, this MR job is only running on a pseudo-distributed VM
> (cloudera cdh4).
>
> I have modified my mapred-site.xml with the following (that last one is
> commented out because it crashes my MR job):
>
>   <property>
>     <name>mapred.task.profile</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>mapred.task.profile.maps</name>
>     <value>0-2</value>
>   </property>
>   <property>
>     <name>mapred.task.profile.reduces</name>
>     <value>0-2</value>
>   </property>
>   <!--property>
>     <name>mapred.task.profile.params</name>
>
> <value>agentlib:hprof=cpu=samples,heap=sites,depth=6,force=n,thread=y,verbose=n,file=%s</value>
>   </property-->
> Are there any resources that explain how to interpret the results?
> Or maybe an open-source app that could help display the results in a more
> intuiative manner?
>
> Ideally, we'd want to know where we are spending most of our time.
>
> Cheers,
>
> David
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB