Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Ideas for data processing

Copy link to this message
RE: Ideas for data processing
Sameer, did you check out the TOMAP function in the documentation?  The example is close to yours.  I think with a nested FOREACH in combination with TOMAP and you'd get there, though I haven't tried it myself.

Steve Bernstein

408.499.0961 MOBILE

-----Original Message-----
From: Sameer Tilak [mailto:[EMAIL PROTECTED]]
Sent: Monday, February 03, 2014 2:00 PM
Subject: Ideas for data processing

Hi everyone,
We have data set in the following format:
user1    item1    valueuser2    item1   valueuser3     item1   value...................user1     item2  valueuser20   item2  valueuser35   item2  value..................user2     item3 valueuser25   item3  value.......
We have around 20 items and millions of users and not all users have entries for all the items. We would like to transform this into
user1 item1 value, item2, value, item3, value....user2 item4 value, item 18 value, item 19 value.....
I can think of a couple of ways for doing this in Pig Latin. For example, one way would be to create a map (where key is item name and value is the associated value) and then fill out that map as you read the data. Then write it out to a file. I am not sure how efficient will that be. I would love to get suggestions for doing this in Pig Latin.