Is there a description of how MapReduce under Hadoop 2.0 assigns mapper tasks to preferred nodes? I think that someone on the list mentioned previously that it attempted to assign "one HDFS block per mapper task", but given that there can be multiple block instances per data split, how does MapReduce try to obtain an even task assignment while optimizing data locality?
Chief Architect, RedPoint Global Inc.
1515 Walnut Street | Suite 200 | Boulder, CO 80302
T: +1 303 541 1516 | M: +1 720 938 5761 | F: +1 781-705-2077
Skype: jlilley.redpoint | [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> | www.redpoint.net<http://www.redpoint.net/>