-SQL windowing functions.( Issue HIVE-896)
Butani, Harish 2012-01-17, 06:11
I have been developing a solution that works with Hive. Here is a simple e.g.
from <select county, tract, arealand from geo_header_sf1 where sumlev = 140>
partition by county
order by county, arealand desc
rank() as r,
sum(arealand) over rows between unbounded preceding and current row as cum_area
select county, tract, arealand, r, cum_area where <r <= 3>
We have loaded the Census data into our Hive instance. The Query is calculating the Top 3 Tracts(based on land area) by County.
Windowing Queries can be expressed on any Hive Query or Table or on an HDFS file. Windowing processing works in MR mode
or in Hive mode. MR mode generates a MR job to evaluate the windowing functions. In Hive Mode, the Windowing process is
invoked from the Hive Script Operator. Support for 16 functions divided into: Ranking, Aggregation and navigation.
Support for both value and range boundaries for Windows. Easy to extend through groovy integration. Code is available
at https://github.com/hbutani/SQLWindowing. A writeup is available at https://github.com/hbutani/SQLWindowing/blob/master/docs/Windowing.pdf?raw=true
Please send me an email if you are interested in using this.