Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> What is the best way to do counting in pig?


Copy link to this message
-
Re: What is the best way to do counting in pig?
Is your goal to have the 10 largest rows by member_id?

2012/7/2 Sheng Guo <[EMAIL PROTECTED]>

> Hi all,
>
> I used to use the following pig script to do the counting of the records.
>
> m_skill_group = group m_skills_filter by member_id;
> grpd = group m_skill_group all;
> cnt = foreach grpd generate COUNT(m_skill_group);
>
> cnt_filter = limit cnt 10;
> dump cnt_filter;
>
>
> but sometimes, when the records get larger, it takes lots of time and hang
> up, and or die.
> I thought counting should be simple enough, so what is the best way to do a
> counting in pig?
>
> Thanks!
>
> Sheng
>