Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Could not infer the matching function for org.apache.pig.builtin.COUNT


Copy link to this message
-
Re: Could not infer the matching function for org.apache.pig.builtin.COUNT
Prashant Kommireddi 2012-03-22, 21:40
Aggregation functions (COUNT, SUM, AVG..) work on bags. Since you are
counting on the entire relation in this case you did a GROUP ALL, in which
case, as you said, forms a bag out of all tuples.

grunt> A = load 'data' as (a:int, b:int);
grunt> describe A;
A: {a: int,b: int}

Now, once the GROUP operator is applied, Pig implicitly assigns 'group' as
the first field in resulting relation. And the relation you grouped on (in
this example 'A' and in your case 'productscans') holds a handle to the bag
of tuples.

grunt> B = group A ALL;
grunt> describe B;
B: {group: chararray,A: {(a: int,b: int)}}

So you can now either reference using alias 'A' or using positional
notation '$1'. Note $0 refers to implicit field generated by Pig called
'group'.

grunt> C = foreach B generate group, COUNT(A);
OR
grunt> C = foreach B generate group, COUNT($1);

Thanks,
Prashant

You are counting against 'productscans' as that is the original relation
you group'ed on.

On Thu, Mar 22, 2012 at 1:46 PM, Jason Alexander <[EMAIL PROTECTED]>wrote:

> Very nice, worked like a champ, Prashant.
>
> Any chance you could explain why? I'd love to be taught to fish, not just
> given the fish to eat. ;-)
>
> GROUP ALL, as I read it, pulls the tuples into a single group. But,
> FOREACH'ing on each group, and counting against productscans is where my
> brain starts to hurt.
>
>
> Thanks again for your help!
> -Jason
>
>
> On Mar 22, 2012, at 3:33 PM, Prashant Kommireddi wrote:
>
> > Hi Jason,
> >
> > Are you trying to count the number of records in the relation
> > 'productscans'? In which case you would have to use GROUP
> > http://pig.apache.org/docs/r0.9.1/basic.html#GROUP
> >
> > grpd = GROUP productscans ALL;
> > scancount = FOREACH grpd GENERATE COUNT(productscans);
> > DUMP scancount;
> >
> > Thanks,
> > Prashant
> >
> > On Thu, Mar 22, 2012 at 1:28 PM, Jason Alexander <[EMAIL PROTECTED]
> >wrote:
> >
> >> Hey all,
> >>
> >>
> >> I'm trying to write a script to pull the count of a dataset that I've
> >> filtered.
> >>
> >> Here's the script so far:
> >>
> >> /* scans by title */
> >>
> >> scans = LOAD '/hive/scans/*' USING PigStorage(',') AS
> >>
> (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
> >> productscans = FILTER scans BY (title MATCHES 'proactiv');
> >> scancount = FOREACH productscans GENERATE COUNT($0);
> >> DUMP scancount;
> >>
> >> For some reason, I get the error:
> >>
> >> Could not infer the matching function for org.apache.pig.builtin.COUNT
> as
> >> multiple or none of them fit. Please use an explicit cast.
> >>
> >> What am I doing wrong here? I'm assuming it has something to do with the
> >> type of the field I'm passing in, but I can't seem to resolve this.
> >>
> >>
> >> TIA,
> >> -Jason
> >>
> >>
> >>
> >>
>
>