Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> HiveQL for 'rank() over (partition by ... order by ...)'?


Copy link to this message
-
Re: HiveQL for 'rank() over (partition by ... order by ...)'?
Yes, we have UDF functions that compute cumulative (such a rank) and moving
aggregates.
In each case, the first parameter is the partitioning key so that the
function knows when to 'reset' at the start of a new partition.
To enforce partitioning and ordering, use DISTRIBUTE BY key  / SORT BY key,
sort_attr

One caveat: you have to do DISTRIBUTE/SORT BY in a subquery. The following
doesn't work correctly:

SELECT product_id, date, mavg(product_id, price, 10)
FROM Prices
DISTRIBUTE BY product_id
SORT BY product_id, date

You have to use a subquery:
SELECT ...
FROM (SELECT *
           FROM Prices
           DISTRIBUTE BY ..
           SORT BY ...)
If you have multiple key attributes you have to combine them into a single
key using concat_ws() for example.

As far as the UDF implementation goes, you have to define your function for
every combination of key and base attribute types that you are going to use.
 I don't know any way around that.
Hope this helps
On Wed, May 25, 2011 at 4:45 AM, 김영우 <[EMAIL PROTECTED]> wrote:

> Hi All,
>
> I'm trying to convert Oracle SQL to HiveQL, esp Oracle Analytic functions.
> Is it possible to implement using Hive UDFs or workarounds?
> I would like to hear your experiences and advice.
>
> Thanks in advance.
>
> - Youngwoo
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB