Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Help with Script


Copy link to this message
-
Re: Help with Script
SUM function requires that you specify the specific element from the
grouping. In this case, U_tm and U_cnt are both within group/bags and need
to be accessed as "reqd.U_tm" and "reqd.U_cnt".
--Sum the User Counts and Times
G3 = foreach G2 generate group,SUM(reqd.U_tm)as time,SUM(reqd.U_cnt)as
count;

On Nov 13, 2012, at 9:58 AM, ingvay7 <[EMAIL PROTECTED]> wrote:

(Apologies for resending but corrected script below)
This is the error I got:

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: Could not infer the
matching function for org.apache.pig.builtin.SUM as multiple or none of
them fit. Please use an explicit cast.

Updated code:

a = LOAD 'Report' AS (
        dt:chararray,
        Server:chararray,
        Type:chararray,
        Ops:chararray,
        UserID:chararray,
        U_cnt:int,
        U_tm:int,
        U_min_tm:int,
        U_max_tm:int,
        U_avg_tm:float
        );
--Remove Test Servers
remtest = filter a by not Server matches 'Test%';

-- Filter to required columns
reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;

--Groupby
G2 = group reqd by (Server,Type,Ops);

--Sum the User Counts and Times
G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;

store G3 into 'Servertest';

----- Original Message -----
From: Prashant Kommireddi <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Cc:
Sent: Tuesday, November 13, 2012 11:59 AM
Subject: Re: Help with Script

Hi,

Can you paste the error message here?

Sent from my iPhone

On Nov 13, 2012, at 8:34 AM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:

hey all,
Very new Pig user here. I think I'm trying to get something very simple
done but getting a few errors. See me script below.Any guidance will be
appreciated.Thanks.
I get errors such as  Error during parsing. Invalid alias: serverin {time:
double,count: double}

I am basically trying to duplicate the following SQL query:
select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)

from TableA

group by 1, 2, 3;
My script is as follows:
a = LOAD 'Report' AS (

        dt:chararray,

        Server:chararray,

        Type:chararray,

        Ops:chararray,

        UserID:chararray,

        U_cnt:int,

        U_tm:int,

        U_min_tm:int,

        U_max_tm:int,

        U_avg_tm:float,

        );

--Remove Test Servers

remtest = filter a by not Server matches 'Test%';

-- Filter to required columns

reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;

--Groupby

G2 = group reqd by Server,Type,Ops;

--Sum the User Counts and Times

G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;

--byServeroperation = order G3 by Server;

store G3 into 'Servertest';
ingvay7