-Re: HiveQL for 'rank() over (partition by ... order by ...)'?
Igor Tatarinov 2011-05-25, 14:43
Yes, we have UDF functions that compute cumulative (such a rank) and moving
In each case, the first parameter is the partitioning key so that the
function knows when to 'reset' at the start of a new partition.
To enforce partitioning and ordering, use DISTRIBUTE BY key / SORT BY key,
One caveat: you have to do DISTRIBUTE/SORT BY in a subquery. The following
doesn't work correctly:
SELECT product_id, date, mavg(product_id, price, 10)
DISTRIBUTE BY product_id
SORT BY product_id, date
You have to use a subquery:
FROM (SELECT *
DISTRIBUTE BY ..
SORT BY ...)
If you have multiple key attributes you have to combine them into a single
key using concat_ws() for example.
As far as the UDF implementation goes, you have to define your function for
every combination of key and base attribute types that you are going to use.
I don't know any way around that.
Hope this helps
On Wed, May 25, 2011 at 4:45 AM, 김영우 <[EMAIL PROTECTED]> wrote:
> Hi All,
> I'm trying to convert Oracle SQL to HiveQL, esp Oracle Analytic functions.
> Is it possible to implement using Hive UDFs or workarounds?
> I would like to hear your experiences and advice.
> Thanks in advance.
> - Youngwoo