-Re: Shuffling over the network for local map data.
Suresh Kumar 2013-01-22, 22:22
I checked the /etc/hosts and it is configured correctly. Looks like the
slow shuffle read speeds we were getting are due to slow disk IO.
I will go through the change MAPREDUCE-4049 and see if I can update my
patch to work with that code on version 3.0.0
I did not think of EC2, that is a good idea.
On Tue, Jan 22, 2013 at 11:24 AM, Luke Lu <[EMAIL PROTECTED]> wrote:
> You can setup the right /etc/hosts to support the loopback. OTOH, saving
> disk io would be more important for small clusters with large instances.
> Hadoop historically works on large clusters with relatively small
> instances, so the issue was not as acute. MAPREDUCE-4049 allows the shuffle
> to be pluggable, so you won't have to patch Hadoop framework code itself.
> Are you saying that you don't have access to EC2?
> On Tue, Jan 22, 2013 at 11:02 AM, Suresh Kumar <[EMAIL PROTECTED]
> > I have a patch that tries to use file links instead of making a copy of
> > the data that is already available locally. I tested it on the a single
> > machine cluster configuration running 48 mappers and reducers. I
> > unfortunately do not have access to a cluster even a small one. Can some
> > review and test run my patch ?
> > I created the patch using Eclipse against 1.0.3. My knowledge in Java in
> > limited and the code is not well written in some classes. So please let
> > know if I need to make changes to the code along with a short explanation
> > of the change. I will happily do so.
> > Thanks,
> > Suresh.