Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Fw: Help with Script


+
ingvay7 2012-11-13, 17:57
+
Prashant Kommireddi 2012-11-13, 18:40
+
ingvay7 2012-11-13, 18:49
+
pablomar 2012-11-13, 18:36
Copy link to this message
-
Help with Script
hey all,

Very new Pig user here. I think I'm trying to get something very simple done but getting a few errors. See me script below.Any guidance will be appreciated.Thanks.

I get errors such as  Error during parsing. Invalid alias: serverin {time: double,count: double}
I am basically trying to duplicate the following SQL query:

select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)
from TableA
group by 1, 2, 3;

My script is as follows:

a = LOAD 'Report' AS (
dt:chararray,
Server:chararray,
Type:chararray,
Ops:chararray,
UserID:chararray,
U_cnt:int,
U_tm:int,
U_min_tm:int,
U_max_tm:int,
U_avg_tm:float,
);
--Remove Test Servers
remtest = filter a by not Server matches 'Test%';
-- Filter to required columns
reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
--Groupby
G2 = group reqd by Server,Type,Ops;
--Sum the User Counts and Times
G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
--byServeroperation = order G3 by Server;
store G3 into 'Servertest';

ingvay7
+
Prashant Kommireddi 2012-11-13, 16:59
+
Vishwanath 2012-11-13, 17:25
+
pablomar 2012-11-13, 16:59
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB