Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Measuring running times


+
Antonio DEttole 2010-03-17, 11:47
Copy link to this message
-
Re: Measuring running times
At the default log level, Hadoop job logs (the ones you also get in the
job's output directory under _logs/history) contain entries like the
following:

ReduceAttempt TASK_TYPE="REDUCE" TASKID="tip_200809020551_0008_r_000002"
TASK_ATTEMPT_ID="task_200809020551_0008_r_000002_0"
START_TIME="1220331166789"
HOSTNAME="tracker_foo.bar.com:localhost/127.0.0.1:44755"

ReduceAttempt TASK_TYPE="REDUCE" TASKID="tip_200809020551_0008_r_000002"
TASK_ATTEMPT_ID="task_200809020551_0008_r_000002_0"
TASK_STATUS="SUCCESS" SHUFFLE_FINISHED="1220332036001"
SORT_FINISHED="1220332036014" FINISH_TIME="1220332063254"
HOSTNAME="tracker_foo.bar.com:localhost/127.0.0.1:44755"

You get start time, shuffle finish time, sort finish time and overall
finish time. Similarly, you get start and finish time for MapAttempt
entries.

Hope this helps,

Simone

On 03/17/10 12:47, Antonio D'Ettole wrote:
> Hi everybody,
> as part of my project work at school I'm running some Hadoop jobs on a
> cluster. I'd like to measure exactly how long each phase of the process
> takes: mapping, shuffling (ideally divided in copying and sorting) and
> reducing. The tasktracker logs do not seem to supply the start/end times for
> each phase, at least not all of them, even when the log level is set to
> DEBUG.
> Do you have any ideas on how I could work this out?
> Thanks
> Antonio
>
--
Simone Leo
Distributed Computing group
Advanced Computing and Communications program
CRS4
POLARIS - Building #1
Piscina Manna
I-09010 Pula (CA) - Italy
e-mail: [EMAIL PROTECTED]
http://www.crs4.it
+
Owen OMalley 2010-03-17, 15:45
+
Antonio DEttole 2010-03-17, 22:16