-Re: Shuffling over the network for local map data.
Luke Lu 2013-01-22, 19:24
You can setup the right /etc/hosts to support the loopback. OTOH, saving
disk io would be more important for small clusters with large instances.
Hadoop historically works on large clusters with relatively small
instances, so the issue was not as acute. MAPREDUCE-4049 allows the shuffle
to be pluggable, so you won't have to patch Hadoop framework code itself.
Are you saying that you don't have access to EC2?
On Tue, Jan 22, 2013 at 11:02 AM, Suresh Kumar <[EMAIL PROTECTED]>wrote:
> I have a patch that tries to use file links instead of making a copy of
> the data that is already available locally. I tested it on the a single
> machine cluster configuration running 48 mappers and reducers. I
> unfortunately do not have access to a cluster even a small one. Can some on
> review and test run my patch ?
> I created the patch using Eclipse against 1.0.3. My knowledge in Java in
> limited and the code is not well written in some classes. So please let me
> know if I need to make changes to the code along with a short explanation
> of the change. I will happily do so.