I have a very simple use case...
Basically I have an edge list and I am trying to convert it into adjacency
and so on..
What I am trying to build is
.. and so on..
But every now and then.. I hit a super node..which has millions of edges..
Thus keying on just node id is results in poor MR execution because of this
I have been trying to understand partitioner.. but I am at lost how to use
How do i solve this straggler issue?