Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> how to perform GROUP BY in PIG for this case:


+
yogesh dhari 2012-09-29, 22:02
+
Russell Jurney 2012-09-29, 23:15
+
yogesh dhari 2012-09-29, 23:32
Copy link to this message
-
Re: how to perform GROUP BY in PIG for this case:
My bad - you will need to register the Piggybank and jodatime jars. Replace
/me/pig with your pig install path.

register /me/pig/contrib/piggybank/java/piggybank.jar
register /me/pig/build/ivy/lib/Pig/joda-time-1.6.jar

define CustomFormatToISO
org.apache.pig.piggybank.evaluation.datetime.convert.CustomFormatToISO();

define ISOToMonth
org.apache.pig.piggybank.evaluation.datetime.truncate.ISOToMonth()
That should take care of the error.

This example may help:
https://github.com/rjurney/Collecting-Data/blob/master/src/pig/rfc1123_to_iso8601.pig

Russell Jurney http://datasyndrome.com

On Sep 29, 2012, at 4:33 PM, yogesh dhari <[EMAIL PROTECTED]> wrote:
Thanks Russell,

I am new to Pig. I have tried this command.
and got this exception.

2012-09-30 04:53:22,995 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1070: Could not resolve ISOToMonth using imports: [,
org.apache.pig.builtin., org.apache.pig.impl.builtin.]

Is there some thing more I need to do like import or some thing like that.

Please suggest.

Thanks & regards
Yogesh Kumar

From: [EMAIL PROTECTED]

Date: Sat, 29 Sep 2012 16:15:18 -0700

Subject: Re: how to perform GROUP BY in PIG for this case:

To: [EMAIL PROTECTED]
answer = foreach (group data by ISOToMonth(Date)) generate group as

month, MAX(data.rate) as max_rate;
Note, you will need your date in ISO8601 format, and you can use

CustomFormatToISO to convert it if it's is a string, or UnixToISO if

your date is a long.
See:
http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/convert/CustomFormatToISO.html

http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/convert/UnixToISO.html

http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/truncate/ISOToMonth.html
Russell Jurney http://datasyndrome.com
On Sep 29, 2012, at 3:02 PM, yogesh dhari <[EMAIL PROTECTED]> wrote:
Hi all,
I have this data, having fields  (Date, symbol, rate)
and I want it to be group by Months, and to find out the maximum rate value
for each month.
like: for month (08, 36.3), (09, 36.4), (10, 36.8), (11, 37.5) ..
(2009-08-21,CLI,33.38)
(2009-08-24,CLI,33.03)
(2009-08-25,CLI,33.16)
(2009-08-26,CLI,32.78)
(2009-08-27,CLI,32.79)
(2009-08-28,CLI,33.37)
(2009-08-31,CLI,32.51)
(2009-09-11,CLI,34.08)
(2009-09-14,CLI,35.19)
(2009-09-15,CLI,35.82)
(2009-09-16,CLI,36.58)
(2009-09-24,CLI,33.98)
(2009-09-25,CLI,32.44)
(2009-09-28,CLI,33.34)
(2009-09-29,CLI,33.6)
(2009-09-30,CLI,33.24)
(2009-10-01,CLI,31.98)
(2009-10-02,CLI,31.21)
(2009-10-05,CLI,31.31)
(2009-10-21,CLI,32.86)
(2009-10-26,CLI,33.15)
(2009-10-27,CLI,32.71)
(2009-10-28,CLI,32.03)
(2009-10-29,CLI,32.05)
(2009-10-30,CLI,31.88)
(2009-11-02,CLI,31.88)
(2009-11-03,CLI,31.16)
(2009-11-04,CLI,31.47)
(2009-11-09,CLI,31.59)
(2009-11-25,CLI,30.58)
(2009-11-27,CLI,30.19)
(2009-11-30,CLI,30.86)
(2009-12-01,CLI,31.74)
(2009-12-02,CLI,32.62)
(2009-12-03,CLI,33.43)
(2009-12-04,CLI,34.12)
(2009-12-07,CLI,33.77)
(2009-12-08,CLI,33.8)
(2009-12-09,CLI,33.71)
Please help and suggest .
Thanks & Regards
Yogesh Kumar
+
yogesh dhari 2012-09-30, 03:18
+
Russell Jurney 2012-09-30, 03:36
+
yogesh dhari 2012-09-30, 04:58
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB