Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - What is the best way to do counting in pig?


Copy link to this message
-
Re: What is the best way to do counting in pig?
Jonathan Coveney 2012-07-03, 16:56
instead of doing "dump relation," do "explain relation" (then run
identically) and paste the output here. It will show whether the combiner
is being used,

2012/7/3 Ruslan Al-Fakikh <[EMAIL PROTECTED]>

> Hi,
>
> As it was said, COUNT is algebraic and should be fast, because it
> forces combiner. You should make sure that combiner is really used
> here. It can be disabled in some situations. I've encountered such
> situations many times when a job is tooo heavy in case no combiner is
> applied.
>
> Ruslan
>
> On Tue, Jul 3, 2012 at 1:35 AM, Subir S <[EMAIL PROTECTED]> wrote:
> > Right!!
> >
> > Since it is mentioned that job is hanging, wild guess is it must be
> > 'group all'. How can that be confirmed?
> >
> > On 7/3/12, Jonathan Coveney <[EMAIL PROTECTED]> wrote:
> >> group all uses a single reducer, but COUNT is algebraic, and as such,
> will
> >> use combiners, so it is generally quite fast.
> >>
> >> 2012/7/2 Subir S <[EMAIL PROTECTED]>
> >>
> >>> Group all - uses single reducer AFAIU. You can try to count per group
> >>> and sum may be.
> >>>
> >>> You may also try with COUNT_STAR to include NULL fields.
> >>>
> >>> On 7/3/12, Sheng Guo <[EMAIL PROTECTED]> wrote:
> >>> > Hi all,
> >>> >
> >>> > I used to use the following pig script to do the counting of the
> >>> > records.
> >>> >
> >>> > m_skill_group = group m_skills_filter by member_id;
> >>> > grpd = group m_skill_group all;
> >>> > cnt = foreach grpd generate COUNT(m_skill_group);
> >>> >
> >>> > cnt_filter = limit cnt 10;
> >>> > dump cnt_filter;
> >>> >
> >>> >
> >>> > but sometimes, when the records get larger, it takes lots of time and
> >>> hang
> >>> > up, and or die.
> >>> > I thought counting should be simple enough, so what is the best way
> to
> >>> do a
> >>> > counting in pig?
> >>> >
> >>> > Thanks!
> >>> >
> >>> > Sheng
> >>> >
> >>>
> >>
>
>
>
> --
> Best Regards,
> Ruslan Al-Fakikh
>