Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Count grouped by title


Copy link to this message
-
Re: Count grouped by title
You need to use the implicit 'group' to reference title. The error was
pretty clear in this case.

grunt> scancount               = FOREACH groupedscans GENERATE title,
COUNT(productscans);
2012-03-26 10:41:43,497 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1025:
<line 5, column 56> Invalid field projection. Projected field [title] does
not exist in schema:
group:chararray,productscans:bag{:tuple(thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray)}.
Instead use 'group'

grunt> scancount               = FOREACH groupedscans GENERATE group,
COUNT(productscans);

Thanks,
Prashant

On Mon, Mar 26, 2012 at 10:39 AM, Jason Alexander <[EMAIL PROTECTED]>wrote:

> Hey guys,
>
>
>
> Continuing on in my Pig education, I'm trying to pivot my previous script
> to give me a break down of count by title.
>
> The script I have so far is:
>
> /* scans grouped by title */
>
> scans                   = LOAD '/hive/scans/*' USING PigStorage(',') AS
> (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
> productscans    = FILTER scans BY (title MATCHES 'battery');
> groupedscans    = GROUP productscans BY title;
> scancount               = FOREACH groupedscans GENERATE title,
> COUNT(productscans);
> --DUMP scancount;
> STORE scancount INTO '/output/scans/groupedscans.out';
>
>
>
> I'm sure it's something goofy and easy, but any help would be much
> appreciated!
>
>
> Thanks,
> -Jason
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB