Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # dev - Re: map-reduce-related school project help

sampanriver@... 2012-11-26, 02:54
rshepherd 2012-11-26, 02:55
Alex Halter 2012-11-26, 04:13
Copy link to this message
map-reduce-related school project help
rshepherd 2012-11-26, 01:38
Hi everybody,

I am a student at NYU and am evaluating an idea for final project for a
distributed systems class. The idea is roughly as follows; the overhead
for running map-reduce on a 'small' job is high. (A small job would be
defined as something fitting in memory on a single machine.) Can
hadoop's map-reduce be modified to be efficient for jobs such as this?

It seems that one way to do begin to achieve this goal would be to
modify the way the intermediate key-value pairs are handled, the
"handoff" from the map to the reduce. Rather than writing them to HDFS,
either pass them directly to a reducer or keep them in memory in a data
structure. Using a single, shared hashmap would alleviate the need to
sort the mapper output. Instead perhaps distribute the slots to a
reducer or reducers on multiple threads. My hope is that, as this is a
simplification of distributed  map-reduce, it will be relatively
straightforward to alter the code to in-memory approach for smaller jobs
that would perform very well for this special case.

I was hoping that someone on the list could help me with the following

1) Does this sound like a good idea that might be achievable in a few weeks?
2) Does my intuition about how to achieve the goal seem reasonable?
3) If so, any advice on now to navigate the code base? (Any pointers on
packages/classes of interest would be highly appreciated)
4) Any other feedback?

Thanks in advance to anyone willing and able to help!
Mahesh Balija 2012-11-26, 05:46
rshepherd 2012-11-28, 17:24
rshepherd 2012-11-27, 17:35
Sampan River 2012-11-26, 07:51