psdc1978 2010-06-09, 15:17
> If I define in mapred-site.xml the property mapred.reduce.tasks to 1, how
> many reduce tasks will actually run? I think it will run 2 and I don't know
Did you actually notice this happening ? Also, can you give some
information about the cluster where you're running into this problem -
like is it a single node / pseudo-distributed mode, etc.
> But in a log that I've added, the two constructors of the
> ReduceTask.java class will run ( ReduceTask() and ReduceTask(with
> parameters) ).
> I don't understand why ReduceTask() [with no parameters] willl run. Here's
> the stacktrace that I get to understand the thread of execution of this
> at org.apache.hadoop.mapred.ReduceTask.<init>(ReduceTask.java:164)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:445)
> As you can see, the ReduceTask() comes from the Connection class. Can
> anyone explain me what's the purpose of this thread of execution, and what's
> the purpose of the Client class?
The tasktracker gets 'work to do' in the form of Map or Reduce task
objects from the JobTracker. It gets it as a response to a 'heartbeat'
it periodically sends to the JobTracker. You can see from the stack
trace the name 'HeartbeatResponse' to tell this is where it came from.
This communication happens over Hadoop's custom IPC that uses a
serialization format called 'Writables'. When an object (in this case
the 'ReduceTask') needs to be read off the stream, it gets
deserialized and an object of the corresponding type is instantiated.
To do this, an empty constructor is defined for each Writable and
called during deserialization. That would explain the call to the
empty constructor of ReduceTask.
> 2 -
> A ReduceTask is launched by a TaskTracker in a new child JVM, right?
Mostly, yes. Unless the 'JVM Reuse' feature is turned on, in which
case the TaskTracker has an option to launch a ReduceTask in a
previously launched JVM for the same job's reduces.
> 3 -
> A TaskTracker is a thread that can run several map and reduces at the the
> same time, right?
A TaskTracker is a Java process that can run several maps and reduces together.