Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Spreading data in Pig


Copy link to this message
-
Spreading data in Pig

hey all,

Can anyone let me know how I can accomplish below problem in Pig?

I have 2 data sources:

TABLE A with a list of User IDs:

User1
User2
User3
User4
User5
User6
User7
User8
User9

TABLE B with (Host name, Capacity):

Hostb 2
Hostc 4
Hostd 3
I basically need to spread the data in table A based on Table B based on how much capacity Table B has.

So end result should be a file:

User1 Hostb
User2 Hostb
User3 Hostc
User4 Hostc
User5 Hostc
User6 Hostc
User7 Hostd
User8 Hostd
User9 Hostd

The order does not matter as long as each Host gets the capacity it can take. Also the SUM(TableB.Capacity) will always be COUNT(TableA.UserID) so there wont be any extra or less values to plug in.
thanks,
JM