Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - SQL windowing functions.( Issue HIVE-896)

Copy link to this message
SQL windowing functions.( Issue HIVE-896)
Butani, Harish 2012-01-17, 06:11

I have been developing a solution that works with Hive. Here is a simple e.g.

from <select county, tract, arealand from geo_header_sf1 where sumlev = 140>

partition by county

order by county, arealand desc


  rank() as r,

  sum(arealand) over rows between unbounded preceding and current row as cum_area

select county, tract, arealand, r, cum_area where <r <= 3>

into path='/tmp/wout'

We have loaded the Census data into our Hive instance. The Query is calculating the Top 3 Tracts(based on land area) by County.

Windowing Queries can be expressed on any Hive Query or Table or on an HDFS file. Windowing processing works in MR mode

or in Hive mode. MR mode generates a MR job to evaluate the windowing functions. In Hive Mode, the Windowing process is

invoked from the Hive Script Operator. Support for 16 functions divided into: Ranking, Aggregation and navigation.

Support for both value and range boundaries for Windows. Easy to extend through groovy integration. Code is available

at https://github.com/hbutani/SQLWindowing. A writeup is available at https://github.com/hbutani/SQLWindowing/blob/master/docs/Windowing.pdf?raw=true

Please send me an email if you are interested in using this.


Harish Butani