Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Count grouped by title


+
Jason Alexander 2012-03-26, 17:39
+
Prashant Kommireddi 2012-03-26, 17:43
+
Jason Alexander 2012-03-26, 19:02
Copy link to this message
-
Re: Count grouped by title
Pig uses Java's regular expression format, which anchors the regex at the
beginning and end of your string-to-be-searched.  This means that the
predicate ...matches 'battery' only returns strings that are exactly
"battery", instead of strings that contain "battery".

Try using ...matches '.*battery.*' instead.

Norbert

On Mon, Mar 26, 2012 at 3:02 PM, Jason Alexander <[EMAIL PROTECTED]>wrote:

> Thanks Prashant,
>
>
> Well, before I wasn't getting any specific error, I was just getting
> nothing written out.
>
> Updating the script based on your feedback, the output I get is:
>
> battery 303
>
> Which I assume is the total number of records that have the word "battery"
> in the title.
>
> Ultimately, what I would like to see is:
>
> battery title 1                 15
> battery title 2                 304
> battery title 3                 573
> .
> .
> .
>
>
> How can I accomplish that?
>
>
> Thanks again for all your help,
> -Jason
>
> On Mar 26, 2012, at 12:43 PM, Prashant Kommireddi wrote:
>
> > You need to use the implicit 'group' to reference title. The error was
> > pretty clear in this case.
> >
> > grunt> scancount               = FOREACH groupedscans GENERATE title,
> > COUNT(productscans);
> > 2012-03-26 10:41:43,497 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > ERROR 1025:
> > <line 5, column 56> Invalid field projection. Projected field [title]
> does
> > not exist in schema:
> >
> group:chararray,productscans:bag{:tuple(thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray)}.
> >
> >
> > Instead use 'group'
> >
> > grunt> scancount               = FOREACH groupedscans GENERATE group,
> > COUNT(productscans);
> >
> > Thanks,
> > Prashant
> >
> > On Mon, Mar 26, 2012 at 10:39 AM, Jason Alexander <[EMAIL PROTECTED]
> >wrote:
> >
> >> Hey guys,
> >>
> >>
> >>
> >> Continuing on in my Pig education, I'm trying to pivot my previous
> script
> >> to give me a break down of count by title.
> >>
> >> The script I have so far is:
> >>
> >> /* scans grouped by title */
> >>
> >> scans                   = LOAD '/hive/scans/*' USING PigStorage(',') AS
> >>
> (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
> >> productscans    = FILTER scans BY (title MATCHES 'battery');
> >> groupedscans    = GROUP productscans BY title;
> >> scancount               = FOREACH groupedscans GENERATE title,
> >> COUNT(productscans);
> >> --DUMP scancount;
> >> STORE scancount INTO '/output/scans/groupedscans.out';
> >>
> >>
> >>
> >> I'm sure it's something goofy and easy, but any help would be much
> >> appreciated!
> >>
> >>
> >> Thanks,
> >> -Jason
>
>
+
Jason Alexander 2012-03-26, 19:50
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB