-Spreading data in Pig
John Meek 2013-03-31, 16:06
Can anyone let me know how I can accomplish below problem in Pig?
I have 2 data sources:
TABLE A with a list of User IDs:
TABLE B with (Host name, Capacity):
I basically need to spread the data in table A based on Table B based on how much capacity Table B has.
So end result should be a file:
The order does not matter as long as each Host gets the capacity it can take. Also the SUM(TableB.Capacity) will always be COUNT(TableA.UserID) so there wont be any extra or less values to plug in.