I'm curious about profiling, I see some documentation about it (1.0.3 on
AWS), but the references to JobConf seem to be for the "old api" and I've
got everything running on the "new api".
I've got a job to handle processing of about 30GB of compressed CSVs and
it's taking over a day with 3 m1.medium boxes, more than I expected, so I'd
like to see where the time is being spent.
I've never set up any kind of profiling, so I don't really know what to
Any pointers to help me set up what's suggested here? Am I correct in
understanding that this doc is a little outdated?