-RE: Profiling map reduce jobs?
Azuryy Yu 2013-06-29, 14:34
I just advice to use MultipleOutputFormat, instead of MultipleOurput.write
--Send from my Sony mobile.
On Jun 29, 2013 9:16 PM, "David Poisson" <[EMAIL PROTECTED]>
> Just thought I'd provide some insight into our problem.
> It appears that the problem was a slowdown caused by the use of
> multipleOutputs.write(output, key, keyValue, path) (going from memory
> here). Anyways, after looking at the implementation of that write function
> in multipleOutputs.java it appears that a context was created and a conf
> was gotten and a new recordWriter was gotten for every call to
> write(output, key, keyValue, path).
> We have changed all of those calls to write(output, key, keyValue) (which
> doesn't do any extra things) and it seems to help.
> Anyone else has any tips when using multipleOutputs?
> We are taking our input and splitting it into 3 files. So it seems to be a
> natural choice for MultipleOutputs. Performance is a bit slow though.
> From: David Poisson [[EMAIL PROTECTED]]
> Sent: Thursday, June 27, 2013 4:22 PM
> To: [EMAIL PROTECTED]
> Subject: Profiling map reduce jobs?
> I want to take a look at a MR job which seems to be slower than I had
> hoped. Mind you, this MR job is only running on a pseudo-distributed VM
> (cloudera cdh4).
> I have modified my mapred-site.xml with the following (that last one is
> commented out because it crashes my MR job):
> Are there any resources that explain how to interpret the results?
> Or maybe an open-source app that could help display the results in a more
> intuiative manner?
> Ideally, we'd want to know where we are spending most of our time.