Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - how to perform GROUP BY in PIG for this case:


+
yogesh dhari 2012-09-29, 22:02
+
Russell Jurney 2012-09-29, 23:15
+
yogesh dhari 2012-09-29, 23:32
Copy link to this message
-
Re: how to perform GROUP BY in PIG for this case:
Russell Jurney 2012-09-30, 02:21
My bad - you will need to register the Piggybank and jodatime jars. Replace
/me/pig with your pig install path.

register /me/pig/contrib/piggybank/java/piggybank.jar
register /me/pig/build/ivy/lib/Pig/joda-time-1.6.jar

define CustomFormatToISO
org.apache.pig.piggybank.evaluation.datetime.convert.CustomFormatToISO();

define ISOToMonth
org.apache.pig.piggybank.evaluation.datetime.truncate.ISOToMonth()
That should take care of the error.

This example may help:
https://github.com/rjurney/Collecting-Data/blob/master/src/pig/rfc1123_to_iso8601.pig

Russell Jurney http://datasyndrome.com

On Sep 29, 2012, at 4:33 PM, yogesh dhari <[EMAIL PROTECTED]> wrote:
Thanks Russell,

I am new to Pig. I have tried this command.
and got this exception.

2012-09-30 04:53:22,995 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1070: Could not resolve ISOToMonth using imports: [,
org.apache.pig.builtin., org.apache.pig.impl.builtin.]

Is there some thing more I need to do like import or some thing like that.

Please suggest.

Thanks & regards
Yogesh Kumar

From: [EMAIL PROTECTED]

Date: Sat, 29 Sep 2012 16:15:18 -0700

Subject: Re: how to perform GROUP BY in PIG for this case:

To: [EMAIL PROTECTED]
answer = foreach (group data by ISOToMonth(Date)) generate group as

month, MAX(data.rate) as max_rate;
Note, you will need your date in ISO8601 format, and you can use

CustomFormatToISO to convert it if it's is a string, or UnixToISO if

your date is a long.
See:
http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/convert/CustomFormatToISO.html

http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/convert/UnixToISO.html

http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/truncate/ISOToMonth.html
Russell Jurney http://datasyndrome.com
On Sep 29, 2012, at 3:02 PM, yogesh dhari <[EMAIL PROTECTED]> wrote:
Hi all,
I have this data, having fields  (Date, symbol, rate)
and I want it to be group by Months, and to find out the maximum rate value
for each month.
like: for month (08, 36.3), (09, 36.4), (10, 36.8), (11, 37.5) ..
(2009-08-21,CLI,33.38)
(2009-08-24,CLI,33.03)
(2009-08-25,CLI,33.16)
(2009-08-26,CLI,32.78)
(2009-08-27,CLI,32.79)
(2009-08-28,CLI,33.37)
(2009-08-31,CLI,32.51)
(2009-09-11,CLI,34.08)
(2009-09-14,CLI,35.19)
(2009-09-15,CLI,35.82)
(2009-09-16,CLI,36.58)
(2009-09-24,CLI,33.98)
(2009-09-25,CLI,32.44)
(2009-09-28,CLI,33.34)
(2009-09-29,CLI,33.6)
(2009-09-30,CLI,33.24)
(2009-10-01,CLI,31.98)
(2009-10-02,CLI,31.21)
(2009-10-05,CLI,31.31)
(2009-10-21,CLI,32.86)
(2009-10-26,CLI,33.15)
(2009-10-27,CLI,32.71)
(2009-10-28,CLI,32.03)
(2009-10-29,CLI,32.05)
(2009-10-30,CLI,31.88)
(2009-11-02,CLI,31.88)
(2009-11-03,CLI,31.16)
(2009-11-04,CLI,31.47)
(2009-11-09,CLI,31.59)
(2009-11-25,CLI,30.58)
(2009-11-27,CLI,30.19)
(2009-11-30,CLI,30.86)
(2009-12-01,CLI,31.74)
(2009-12-02,CLI,32.62)
(2009-12-03,CLI,33.43)
(2009-12-04,CLI,34.12)
(2009-12-07,CLI,33.77)
(2009-12-08,CLI,33.8)
(2009-12-09,CLI,33.71)
Please help and suggest .
Thanks & Regards
Yogesh Kumar
+
yogesh dhari 2012-09-30, 03:18
+
Russell Jurney 2012-09-30, 03:36
+
yogesh dhari 2012-09-30, 04:58