Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Bag of tuples

Hi Pig experts,
Sorry to post so many questions, I have one more question on doing some analytics on bag of tuples.

My input has the following format:

{(id1,x,y,z), (id2, a, b, c), (id3,x,a)}  /* User 1 info */
{(id10,x,y,z), (id9, a, b, c), (id1,x,a)} /* User 2 info */
{(id8,x,y,z), (id4, a, b, c), (id2,x,a)} /* User 3 info */
{(id6,x,y,z), (id6, a, b, c), (id9,x,a)} /* User 4 info */

I can change my UDF to give more simple output. However, I want to find out if something like this can be done easily:
I would like to find out top 5 ids (field 1 in a tuple) among all the users. Note that each user has a bag and the first field of each tuple in that bag is id.

How difficult will it be to filter based on fields of tuples and do analytics across the entire user base.