|
|
-
Re: Shuffling over the network for local map data.Suresh Kumar 2013-01-22, 22:22
Hi Luke,
I checked the /etc/hosts and it is configured correctly. Looks like the slow shuffle read speeds we were getting are due to slow disk IO. I will go through the change MAPREDUCE-4049 and see if I can update my patch to work with that code on version 3.0.0 I did not think of EC2, that is a good idea. Thanks, Suresh. On Tue, Jan 22, 2013 at 11:24 AM, Luke Lu <[EMAIL PROTECTED]> wrote: > You can setup the right /etc/hosts to support the loopback. OTOH, saving > disk io would be more important for small clusters with large instances. > Hadoop historically works on large clusters with relatively small > instances, so the issue was not as acute. MAPREDUCE-4049 allows the shuffle > to be pluggable, so you won't have to patch Hadoop framework code itself. > > Are you saying that you don't have access to EC2? > > > On Tue, Jan 22, 2013 at 11:02 AM, Suresh Kumar <[EMAIL PROTECTED] > >wrote: > > > I have a patch that tries to use file links instead of making a copy of > > the data that is already available locally. I tested it on the a single > > machine cluster configuration running 48 mappers and reducers. I > > unfortunately do not have access to a cluster even a small one. Can some > on > > review and test run my patch ? > > > > I created the patch using Eclipse against 1.0.3. My knowledge in Java in > > limited and the code is not well written in some classes. So please let > me > > know if I need to make changes to the code along with a short explanation > > of the change. I will happily do so. > > > > Thanks, > > Suresh. > > > > > > > > > |