Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Group by week?


Copy link to this message
-
RE: Group by week?
Hive expects the expressions in the group by clause to be top level expressions in the select list

so

select platform, datediff('2007-12-31', pdate)/7, count....
from
group by platform, datediff('2007-12-31', pdate)/7

should work. You could also put the select list into a subquery and then do a group by on top of it.

For getting numeric sorting you can do an order by CAST(month(pdate) AS INT). That will probably work for you?

Ashish
________________________________________
From: Andraz Tori [[EMAIL PROTECTED]]
Sent: Friday, July 31, 2009 4:18 AM
To: [EMAIL PROTECTED]
Subject: Group by week?

Two questions:

1. How would one group by week (instead of date or time)?

My first idea was the following:

INSERT OVERWRITE TABLE platforms_weekly select platform, count(distinct
apikey), count(1), date_add(pdate, datediff('2007-12-31', pdate) / 7)
from shairlogs group by platform, datediff('2007-12-31', pdate) / 7
but the issue is that Hive can't know there's a subcalculation that is
the same in both group part and output part.

I tried to use AS statement with group expression, but no luck there [or
maybe I didn't know how].
2. When grouping by months there seems to be a problem of getting the
proper 'yyyy-mm' as part of output columns.

Here my idea was:
INSERT OVERWRITE TABLE platforms_monthly select platform, count(distinct
apikey), count(1), concat(concat(CAST(year(pdate) as STRING), '-'),
CAST(month(pdate) as STRING)) from shairlogs group by platform,
year(pdate), month(pdate)"

But the issue is that output yyyy-mm strings are not zero-padded for
months and lexical sort then breaks on them...

--
Andraz Tori, CTO
Zemanta Ltd, New York, London, Ljubljana
www.zemanta.com
mail: [EMAIL PROTECTED]
tel: +386 41 515 767
twitter: andraz, skype: minmax_test

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB