Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Why inter-rack communication in mapreduce slow?

Copy link to this message
Re: Why inter-rack communication in mapreduce slow?
Most of the network bandwidth used during a MapReduce job should come
from the shuffle/sort phase. This part doesn't use HDFS. The
TaskTrackers running reduce tasks will pull intermediate results from
TaskTrackers running map tasks over HTTP. In most cases, it's
difficult to get rack locality during this process because of the
contract between mappers and reducers. If you wanted locality, your
data would already have to be grouped by key in your source files and
you'd need to use a custom partitioner.


On Mon, Jun 6, 2011 at 9:49 AM,  <[EMAIL PROTECTED]> wrote:
> Yeah, the way you described it, maybe not. Because the hellabytes
> are all coming from one rack. But in reality, wouldn't this be
> more uniform because of how hadoop/hdfs work (distributed more evenly)?
> And if that is true, then for all the switched packets passing through
> the inter-rack switch, a consultation to the tracker would have preceeded
> it?
> Well, I'm just theorizing, and I'm sure we'll see more concrete numbers
> on the relation between # racks, # nodes, # switches, # trackers and
> their configurations.
> I like your idea about racking the trackers though. so it won't need any
> tracker trackers?!? ;)
> On Mon, 06 Jun 2011 09:40:12 -0400, John Armstrong
> <[EMAIL PROTECTED]> wrote:
>> On Mon, 06 Jun 2011 09:34:56 -0400, <[EMAIL PROTECTED]> wrote:
>>> Yeah, that's a good point.
>>> I wonder though, what the load on the tracker nodes (port et. al) would
>>> be if a inter-rack fiber switch at 10's of GBS' is getting maxed.
>>> Seems to me that if there is that much traffic being mitigate across
>>> racks, that the tracker node (or whatever node it is) would overload
>>> first?
>> It could happen, but I don't think it would always.  For example,
> tracker
>> is on rack A; sees that the best place to put reducer R is on rack B;
> sees
>> reducer still needs a few hellabytes from mapper M on rack C; tells M to
>> send data to R; switches on B and C get throttled, leaving A free to
> handle
>> other things.
>> In fact, it almost makes me wonder if an ideal setup is not only to have
>> each of the main control daemons on their own nodes, but to put THOSE
> nodes
>> on their own rack and keep all the data elsewhere.

Joseph Echeverria
Cloudera, Inc.