Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Help with Script


Copy link to this message
-
Re: Help with Script
SUM function requires that you specify the specific element from the
grouping. In this case, U_tm and U_cnt are both within group/bags and need
to be accessed as "reqd.U_tm" and "reqd.U_cnt".
--Sum the User Counts and Times
G3 = foreach G2 generate group,SUM(reqd.U_tm)as time,SUM(reqd.U_cnt)as
count;

On Nov 13, 2012, at 9:58 AM, ingvay7 <[EMAIL PROTECTED]> wrote:

(Apologies for resending but corrected script below)
This is the error I got:

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: Could not infer the
matching function for org.apache.pig.builtin.SUM as multiple or none of
them fit. Please use an explicit cast.

Updated code:

a = LOAD 'Report' AS (
        dt:chararray,
        Server:chararray,
        Type:chararray,
        Ops:chararray,
        UserID:chararray,
        U_cnt:int,
        U_tm:int,
        U_min_tm:int,
        U_max_tm:int,
        U_avg_tm:float
        );
--Remove Test Servers
remtest = filter a by not Server matches 'Test%';

-- Filter to required columns
reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;

--Groupby
G2 = group reqd by (Server,Type,Ops);

--Sum the User Counts and Times
G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;

store G3 into 'Servertest';

----- Original Message -----
From: Prashant Kommireddi <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Cc:
Sent: Tuesday, November 13, 2012 11:59 AM
Subject: Re: Help with Script

Hi,

Can you paste the error message here?

Sent from my iPhone

On Nov 13, 2012, at 8:34 AM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:

hey all,
Very new Pig user here. I think I'm trying to get something very simple
done but getting a few errors. See me script below.Any guidance will be
appreciated.Thanks.
I get errors such as  Error during parsing. Invalid alias: serverin {time:
double,count: double}

I am basically trying to duplicate the following SQL query:
select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)

from TableA

group by 1, 2, 3;
My script is as follows:
a = LOAD 'Report' AS (

        dt:chararray,

        Server:chararray,

        Type:chararray,

        Ops:chararray,

        UserID:chararray,

        U_cnt:int,

        U_tm:int,

        U_min_tm:int,

        U_max_tm:int,

        U_avg_tm:float,

        );

--Remove Test Servers

remtest = filter a by not Server matches 'Test%';

-- Filter to required columns

reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;

--Groupby

G2 = group reqd by Server,Type,Ops;

--Sum the User Counts and Times

G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;

--byServeroperation = order G3 by Server;

store G3 into 'Servertest';
ingvay7
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB