Sorry for the previous incomplete message.
Here is the take 2:
When I use a Replicated Join only 2 map tasks get scheduled (compared to
100+ tasks for the other steps)
What is the idea behind this? What setting do I use to override this
Also, a basic question.
Does hadoop decide the map task capacity or it simply follows the
Map Task Capacity Reduce Task Capacity Avg. Tasks/Node Blacklisted Nodes
64 20 1.00