Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # dev >> map-reduce-related school project help

Copy link to this message
Re: map-reduce-related school project help
Hi Jiang, thanks for your response.

I think the idea would be to be able to use the map-reduce programming
paradigm on small, local jobs. In other words, provide a way to take
existing  jobs that are running in a distributed fashion and run them
against the machine-local version. Part of the purpose is educational,
and intended to illustrate the way that map-reduce is implemented and
the trade-offs that are present. I hope this clarifies things.

On 11/25/12 9:54 PM, [EMAIL PROTECTED] wrote:
> Hi Randy,
> The intermediate key-value pairs are not written to HDFS. They are written to the local file system. Besides, if the job is "small", why do you use the MapReduce? You can just do it on a local machine.
> Jiang Shan
> From: rshepherd
> Date: 2012-11-26 09:38
> To: mapreduce-dev
> Subject: map-reduce-related school project help
> Hi everybody,
> I am a student at NYU and am evaluating an idea for final project for a
> distributed systems class. The idea is roughly as follows; the overhead
> for running map-reduce on a 'small' job is high. (A small job would be
> defined as something fitting in memory on a single machine.) Can
> hadoop's map-reduce be modified to be efficient for jobs such as this?
> It seems that one way to do begin to achieve this goal would be to
> modify the way the intermediate key-value pairs are handled, the
> "handoff" from the map to the reduce. Rather than writing them to HDFS,
> either pass them directly to a reducer or keep them in memory in a data
> structure. Using a single, shared hashmap would alleviate the need to
> sort the mapper output. Instead perhaps distribute the slots to a
> reducer or reducers on multiple threads. My hope is that, as this is a
> simplification of distributed  map-reduce, it will be relatively
> straightforward to alter the code to in-memory approach for smaller jobs
> that would perform very well for this special case.
> I was hoping that someone on the list could help me with the following
> questions:
> 1) Does this sound like a good idea that might be achievable in a few weeks?
> 2) Does my intuition about how to achieve the goal seem reasonable?
> 3) If so, any advice on now to navigate the code base? (Any pointers on
> packages/classes of interest would be highly appreciated)
> 4) Any other feedback?
> Thanks in advance to anyone willing and able to help!
> Randy