Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Ideas for data processing


Copy link to this message
-
RE: Ideas for data processing
Sameer Tilak 2014-02-05, 15:20
Steve,Thanks. Will try that now.

> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> Subject: RE: Ideas for data processing
> Date: Tue, 4 Feb 2014 17:57:44 +0000
>
> Sameer, did you check out the TOMAP function in the documentation?  The example is close to yours.  I think with a nested FOREACH in combination with TOMAP and you'd get there, though I haven't tried it myself.
> SB
>
> ______________________
> Steve Bernstein
> VP/Analytics
>
> 408.499.0961 MOBILE
> deem.com
>
> -----Original Message-----
> From: Sameer Tilak [mailto:[EMAIL PROTECTED]]
> Sent: Monday, February 03, 2014 2:00 PM
> To: [EMAIL PROTECTED]
> Subject: Ideas for data processing
>
> Hi everyone,
> We have data set in the following format:
> user1    item1    valueuser2    item1   valueuser3     item1   value...................user1     item2  valueuser20   item2  valueuser35   item2  value..................user2     item3 valueuser25   item3  value.......
> We have around 20 items and millions of users and not all users have entries for all the items. We would like to transform this into
> user1 item1 value, item2, value, item3, value....user2 item4 value, item 18 value, item 19 value.....
> I can think of a couple of ways for doing this in Pig Latin. For example, one way would be to create a map (where key is item name and value is the associated value) and then fill out that map as you read the data. Then write it out to a file. I am not sure how efficient will that be. I would love to get suggestions for doing this in Pig Latin.
>
>