Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Ordered partitioned data

Copy link to this message
Ordered partitioned data
Ahmed Eldawy 2013-05-13, 14:52
  I have a dataset with two three columns, group_id, position, and name. I
need for each group to generate a concatenated string of all names ordered
by their position. I can do this by sorting all data based on position, (or
group_id and position), then grouping them by group_id, and finally
concatenating names in each group. I have two questions here,
1- Does this really work? In other words, does the GROUP BY operator retain
2- What is the most efficient way to do it? Is it better, if possible, to
group first and then sort?  Let's say I order by the pair (group_id,
position) first, can this be hinted to Pig to make the group by faster.
Thanks for your help
Best regards,
Ahmed Eldawy