|
|
-
RE: Group by week?Ashish Thusoo 2009-07-31, 13:51
Hive expects the expressions in the group by clause to be top level expressions in the select list
so select platform, datediff('2007-12-31', pdate)/7, count.... from group by platform, datediff('2007-12-31', pdate)/7 should work. You could also put the select list into a subquery and then do a group by on top of it. For getting numeric sorting you can do an order by CAST(month(pdate) AS INT). That will probably work for you? Ashish ________________________________________ From: Andraz Tori [[EMAIL PROTECTED]] Sent: Friday, July 31, 2009 4:18 AM To: [EMAIL PROTECTED] Subject: Group by week? Two questions: 1. How would one group by week (instead of date or time)? My first idea was the following: INSERT OVERWRITE TABLE platforms_weekly select platform, count(distinct apikey), count(1), date_add(pdate, datediff('2007-12-31', pdate) / 7) from shairlogs group by platform, datediff('2007-12-31', pdate) / 7 but the issue is that Hive can't know there's a subcalculation that is the same in both group part and output part. I tried to use AS statement with group expression, but no luck there [or maybe I didn't know how]. 2. When grouping by months there seems to be a problem of getting the proper 'yyyy-mm' as part of output columns. Here my idea was: INSERT OVERWRITE TABLE platforms_monthly select platform, count(distinct apikey), count(1), concat(concat(CAST(year(pdate) as STRING), '-'), CAST(month(pdate) as STRING)) from shairlogs group by platform, year(pdate), month(pdate)" But the issue is that output yyyy-mm strings are not zero-padded for months and lexical sort then breaks on them... -- Andraz Tori, CTO Zemanta Ltd, New York, London, Ljubljana www.zemanta.com mail: [EMAIL PROTECTED] tel: +386 41 515 767 twitter: andraz, skype: minmax_test |