Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Count grouped by title


Copy link to this message
-
Re: Count grouped by title
Prashant Kommireddi 2012-03-26, 17:43
You need to use the implicit 'group' to reference title. The error was
pretty clear in this case.

grunt> scancount               = FOREACH groupedscans GENERATE title,
COUNT(productscans);
2012-03-26 10:41:43,497 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1025:
<line 5, column 56> Invalid field projection. Projected field [title] does
not exist in schema:
group:chararray,productscans:bag{:tuple(thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray)}.
Instead use 'group'

grunt> scancount               = FOREACH groupedscans GENERATE group,
COUNT(productscans);

Thanks,
Prashant

On Mon, Mar 26, 2012 at 10:39 AM, Jason Alexander <[EMAIL PROTECTED]>wrote:

> Hey guys,
>
>
>
> Continuing on in my Pig education, I'm trying to pivot my previous script
> to give me a break down of count by title.
>
> The script I have so far is:
>
> /* scans grouped by title */
>
> scans                   = LOAD '/hive/scans/*' USING PigStorage(',') AS
> (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
> productscans    = FILTER scans BY (title MATCHES 'battery');
> groupedscans    = GROUP productscans BY title;
> scancount               = FOREACH groupedscans GENERATE title,
> COUNT(productscans);
> --DUMP scancount;
> STORE scancount INTO '/output/scans/groupedscans.out';
>
>
>
> I'm sure it's something goofy and easy, but any help would be much
> appreciated!
>
>
> Thanks,
> -Jason