Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Ordered partitioned data


Copy link to this message
-
Ordered partitioned data
Hi,
  I have a dataset with two three columns, group_id, position, and name. I
need for each group to generate a concatenated string of all names ordered
by their position. I can do this by sorting all data based on position, (or
group_id and position), then grouping them by group_id, and finally
concatenating names in each group. I have two questions here,
1- Does this really work? In other words, does the GROUP BY operator retain
order?
2- What is the most efficient way to do it? Is it better, if possible, to
group first and then sort?  Let's say I order by the pair (group_id,
position) first, can this be hinted to Pig to make the group by faster.
Thanks for your help
Best regards,
Ahmed Eldawy
+
Cheolsoo Park 2013-05-13, 17:18
+
Ahmed Eldawy 2013-05-13, 19:21
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB