Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Count grouped by title


+
Jason Alexander 2012-03-26, 17:39
+
Prashant Kommireddi 2012-03-26, 17:43
+
Jason Alexander 2012-03-26, 19:02
Copy link to this message
-
Re: Count grouped by title
Pig uses Java's regular expression format, which anchors the regex at the
beginning and end of your string-to-be-searched.  This means that the
predicate ...matches 'battery' only returns strings that are exactly
"battery", instead of strings that contain "battery".

Try using ...matches '.*battery.*' instead.

Norbert

On Mon, Mar 26, 2012 at 3:02 PM, Jason Alexander <[EMAIL PROTECTED]>wrote:

> Thanks Prashant,
>
>
> Well, before I wasn't getting any specific error, I was just getting
> nothing written out.
>
> Updating the script based on your feedback, the output I get is:
>
> battery 303
>
> Which I assume is the total number of records that have the word "battery"
> in the title.
>
> Ultimately, what I would like to see is:
>
> battery title 1                 15
> battery title 2                 304
> battery title 3                 573
> .
> .
> .
>
>
> How can I accomplish that?
>
>
> Thanks again for all your help,
> -Jason
>
> On Mar 26, 2012, at 12:43 PM, Prashant Kommireddi wrote:
>
> > You need to use the implicit 'group' to reference title. The error was
> > pretty clear in this case.
> >
> > grunt> scancount               = FOREACH groupedscans GENERATE title,
> > COUNT(productscans);
> > 2012-03-26 10:41:43,497 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > ERROR 1025:
> > <line 5, column 56> Invalid field projection. Projected field [title]
> does
> > not exist in schema:
> >
> group:chararray,productscans:bag{:tuple(thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray)}.
> >
> >
> > Instead use 'group'
> >
> > grunt> scancount               = FOREACH groupedscans GENERATE group,
> > COUNT(productscans);
> >
> > Thanks,
> > Prashant
> >
> > On Mon, Mar 26, 2012 at 10:39 AM, Jason Alexander <[EMAIL PROTECTED]
> >wrote:
> >
> >> Hey guys,
> >>
> >>
> >>
> >> Continuing on in my Pig education, I'm trying to pivot my previous
> script
> >> to give me a break down of count by title.
> >>
> >> The script I have so far is:
> >>
> >> /* scans grouped by title */
> >>
> >> scans                   = LOAD '/hive/scans/*' USING PigStorage(',') AS
> >>
> (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
> >> productscans    = FILTER scans BY (title MATCHES 'battery');
> >> groupedscans    = GROUP productscans BY title;
> >> scancount               = FOREACH groupedscans GENERATE title,
> >> COUNT(productscans);
> >> --DUMP scancount;
> >> STORE scancount INTO '/output/scans/groupedscans.out';
> >>
> >>
> >>
> >> I'm sure it's something goofy and easy, but any help would be much
> >> appreciated!
> >>
> >>
> >> Thanks,
> >> -Jason
>
>
+
Jason Alexander 2012-03-26, 19:50