Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # dev >> Shuffling over the network for local map data.

Copy link to this message
Re: Shuffling over the network for local map data.
Hi Luke,

I checked the /etc/hosts and it is configured correctly. Looks like the
slow shuffle read speeds we were getting are due to slow disk IO.

I will go through the change MAPREDUCE-4049 and see if I can update my
patch to work with that code on version 3.0.0

I did not think of EC2, that is a good idea.


On Tue, Jan 22, 2013 at 11:24 AM, Luke Lu <[EMAIL PROTECTED]> wrote:

> You can setup the right /etc/hosts to support the loopback. OTOH, saving
> disk io would be more important for small clusters with large instances.
> Hadoop historically works on large clusters with relatively small
> instances, so the issue was not as acute. MAPREDUCE-4049 allows the shuffle
> to be pluggable, so you won't have to patch Hadoop framework code itself.
> Are you saying that you don't have access to EC2?
> On Tue, Jan 22, 2013 at 11:02 AM, Suresh Kumar <[EMAIL PROTECTED]
> >wrote:
> > I have a patch that tries to use file links instead of making a copy of
> > the data that is already available locally. I tested it on the a single
> > machine cluster configuration running 48 mappers and reducers. I
> > unfortunately do not have access to a cluster even a small one. Can some
> on
> > review and test run my patch ?
> >
> > I created the patch using Eclipse against 1.0.3. My knowledge in Java in
> > limited and the code is not well written in some classes. So please let
> me
> > know if I need to make changes to the code along with a short explanation
> > of the change.  I will happily do so.
> >
> > Thanks,
> > Suresh.
> >
> >
> >
> >