Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Profiling map reduce jobs?


Copy link to this message
-
RE: Profiling map reduce jobs?
I just advice to use MultipleOutputFormat, instead of MultipleOurput.write

--Send from my Sony mobile.
On Jun 29, 2013 9:16 PM, "David Poisson" <[EMAIL PROTECTED]>
wrote:

> Just thought I'd provide some insight into our problem.
>
> It appears that the problem was a slowdown caused by the use of
> multipleOutputs.write(output, key, keyValue, path) (going from memory
> here). Anyways, after looking at the implementation of that write function
>  in multipleOutputs.java it appears that a context was created and a conf
> was gotten and a new recordWriter was gotten for every call to
> write(output, key, keyValue, path).
>
> We have changed all of those calls to write(output, key, keyValue) (which
> doesn't do any extra things) and it seems to help.
>
> Anyone else has any tips when using multipleOutputs?
>
> We are taking our input and splitting it into 3 files. So it seems to be a
> natural choice for MultipleOutputs. Performance is a bit slow though.
>
> Cheers!
>
> David
> ________________________________________
> From: David Poisson [[EMAIL PROTECTED]]
> Sent: Thursday, June 27, 2013 4:22 PM
> To: [EMAIL PROTECTED]
> Subject: Profiling map reduce jobs?
>
> Howdy,
>      I want to take a look at a MR job which seems to be slower than I had
> hoped. Mind you, this MR job is only running on a pseudo-distributed VM
> (cloudera cdh4).
>
> I have modified my mapred-site.xml with the following (that last one is
> commented out because it crashes my MR job):
>
>   <property>
>     <name>mapred.task.profile</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>mapred.task.profile.maps</name>
>     <value>0-2</value>
>   </property>
>   <property>
>     <name>mapred.task.profile.reduces</name>
>     <value>0-2</value>
>   </property>
>   <!--property>
>     <name>mapred.task.profile.params</name>
>
> <value>agentlib:hprof=cpu=samples,heap=sites,depth=6,force=n,thread=y,verbose=n,file=%s</value>
>   </property-->
> Are there any resources that explain how to interpret the results?
> Or maybe an open-source app that could help display the results in a more
> intuiative manner?
>
> Ideally, we'd want to know where we are spending most of our time.
>
> Cheers,
>
> David