Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - Estimating Time required to compute M/Rjob

real great.. 2011-04-16, 10:01
Sonal Goyal 2011-04-16, 13:39
Stephen Boesch 2011-04-16, 20:08
Stephen Boesch 2011-04-16, 20:19
Ted Dunning 2011-04-16, 22:03
real great.. 2011-04-17, 14:00
Matthew Foley 2011-04-17, 19:07
Lance Norskog 2011-04-17, 23:57
Copy link to this message
Re: Estimating Time required to compute M/Rjob
Ted Dunning 2011-04-18, 00:07
Turing completion isn't the central question here, really.  The truth
is, map-reduce programs have considerably pressure to be written in a
scalable fashion which limits them to fairly simple behaviors that
result in pretty linear dependence of run-time on input size for a
given program.

The cool thing about the paper that I linked to the other day is that
there are enough cues about the expected runtime of the program
available to make good predictions *without* looking at the details.
No doubt the estimation facility could make good use of something as
simple as the hash of the jar in question, but even without that it is
possible to produce good estimates.

I suppose that this means that all of us Hadoop programmers are really
just kind of boring folk.  On average, anyway.

On Sun, Apr 17, 2011 at 12:07 PM, Matthew Foley <[EMAIL PROTECTED]> wrote:
> Since general M/R jobs vary over a huge (Turing problem equivalent!) range of behaviors, a more tractable problem might be to characterize the descriptive parameters needed to answer the question: "If the following problem P runs in T0 amount of time on a certain benchmark platform B0, how long T1 will it take to run on a differently configured real-world platform B1 ?"
James Seigel Tynt 2011-04-18, 00:25
real great.. 2011-04-18, 01:39
Matthew Foley 2011-04-18, 06:28
real great.. 2011-04-18, 08:49