Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - HiveQL for 'rank() over (partition by ... order by ...)'?


Copy link to this message
-
Re: HiveQL for 'rank() over (partition by ... order by ...)'?
Igor Tatarinov 2011-05-25, 14:43
Yes, we have UDF functions that compute cumulative (such a rank) and moving
aggregates.
In each case, the first parameter is the partitioning key so that the
function knows when to 'reset' at the start of a new partition.
To enforce partitioning and ordering, use DISTRIBUTE BY key  / SORT BY key,
sort_attr

One caveat: you have to do DISTRIBUTE/SORT BY in a subquery. The following
doesn't work correctly:

SELECT product_id, date, mavg(product_id, price, 10)
FROM Prices
DISTRIBUTE BY product_id
SORT BY product_id, date

You have to use a subquery:
SELECT ...
FROM (SELECT *
           FROM Prices
           DISTRIBUTE BY ..
           SORT BY ...)
If you have multiple key attributes you have to combine them into a single
key using concat_ws() for example.

As far as the UDF implementation goes, you have to define your function for
every combination of key and base attribute types that you are going to use.
 I don't know any way around that.
Hope this helps
On Wed, May 25, 2011 at 4:45 AM, 김영우 <[EMAIL PROTECTED]> wrote:

> Hi All,
>
> I'm trying to convert Oracle SQL to HiveQL, esp Oracle Analytic functions.
> Is it possible to implement using Hive UDFs or workarounds?
> I would like to hear your experiences and advice.
>
> Thanks in advance.
>
> - Youngwoo
>