Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Dealing with stragglers in hadoop

Copy link to this message
Dealing with stragglers in hadoop
  I have a very simple use case...
Basically I have an edge list and I am trying to convert it into adjacency

src target
a     b
a    c
b    d
b    e

and so on..
What I am trying to build is

a [b,c]
b [d,e]
.. and so on..

But every now and then.. I hit a super node..which has millions of edges..

Thus keying on just node id is results in poor MR execution because of this
straggler reducer..

I have been trying to understand partitioner.. but I am at lost how to use
it here?

How do i solve this straggler issue?