Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - What is the best way to do counting in pig?


Copy link to this message
-
Re: What is the best way to do counting in pig?
Sheng Guo 2012-07-02, 21:32
I guess that's the reason, using single reducer may cause some problem when
the data is huge, the counting is very time-consuming or even die at the
end.

What do you mean by counting star to null fileds? can you explain a little
more on this? what is the difference between this one and the standard one
in terms of job execution?

Thanks!

On Mon, Jul 2, 2012 at 1:51 PM, Subir S <[EMAIL PROTECTED]> wrote:

> Group all - uses single reducer AFAIU. You can try to count per group
> and sum may be.
>
> You may also try with COUNT_STAR to include NULL fields.
>
> On 7/3/12, Sheng Guo <[EMAIL PROTECTED]> wrote:
> > Hi all,
> >
> > I used to use the following pig script to do the counting of the records.
> >
> > m_skill_group = group m_skills_filter by member_id;
> > grpd = group m_skill_group all;
> > cnt = foreach grpd generate COUNT(m_skill_group);
> >
> > cnt_filter = limit cnt 10;
> > dump cnt_filter;
> >
> >
> > but sometimes, when the records get larger, it takes lots of time and
> hang
> > up, and or die.
> > I thought counting should be simple enough, so what is the best way to
> do a
> > counting in pig?
> >
> > Thanks!
> >
> > Sheng
> >
>