Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - What is the best way to do counting in pig?


Copy link to this message
-
What is the best way to do counting in pig?
Sheng Guo 2012-07-02, 18:42
Hi all,

I used to use the following pig script to do the counting of the records.

m_skill_group = group m_skills_filter by member_id;
grpd = group m_skill_group all;
cnt = foreach grpd generate COUNT(m_skill_group);

cnt_filter = limit cnt 10;
dump cnt_filter;
but sometimes, when the records get larger, it takes lots of time and hang
up, and or die.
I thought counting should be simple enough, so what is the best way to do a
counting in pig?

Thanks!

Sheng