-Re: will an application with two maps but no reduce be suitable for hadoop?
Roman Shaposhnik 2013-04-18, 15:29
On Thu, Apr 18, 2013 at 4:49 AM, Hadoop Explorer
<[EMAIL PROTECTED]> wrote:
> I have an application that evaluate a graph using this algorithm:
> - use a parallel for loop to evaluate all nodes in a graph (to evaluate a
> node, an image is read, and then result of this node is calculated)
> - use a second parallel for loop to evaluate all edges in the graph. The
> function would take in results from both nodes of the edge, and then
> calculate the answer for the edge
> As you can see, the above algorithm would employ two map functions, but no
> reduce function. The total data size can be very large (say 100GB). Also,
> the workload of each node and each edge is highly irregular, and thus load
> balancing mechanisms are essential.
> In this case, will hadoop suit this application? if so, how will the
> architecture of my program like? And will hadoop be able to strike the
> balance between a good load balancing of the second map function, and
> minimizing data transfer of the results from the first map function?
map-only jobs are known in Hadoop ecosystem. For example, that's how
Giraph implements BSP on top of Hadoop. In fact, from what you're
describing it sounds like Giraph could be a good fit. Check it out: