Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - will an application with two maps but no reduce be suitable for hadoop?

Hadoop Explorer 2013-04-18, 11:49
Copy link to this message
Re: will an application with two maps but no reduce be suitable for hadoop?
Roman Shaposhnik 2013-04-18, 15:29
On Thu, Apr 18, 2013 at 4:49 AM, Hadoop Explorer
> I have an application that evaluate a graph using this algorithm:
> - use a parallel for loop to evaluate all nodes in a graph (to evaluate a
> node, an image is read, and then result of this node is calculated)
> - use a second parallel for loop to evaluate all edges in the graph.  The
> function would take in results from both nodes of the edge, and then
> calculate the answer for the edge
> As you can see, the above algorithm would employ two map functions, but no
> reduce function.  The total data size can be very large (say 100GB).  Also,
> the workload of each node and each edge is highly irregular, and thus load
> balancing mechanisms are essential.
> In this case, will hadoop suit this application?  if so, how will the
> architecture of my program like?  And will hadoop be able to strike the
> balance between a good load balancing of the second map function, and
> minimizing data transfer of the results from the first map function?

map-only jobs are known in Hadoop ecosystem. For example, that's how
Giraph implements BSP on top of Hadoop. In fact, from what you're
describing it sounds like Giraph could be a good fit. Check it out: