Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Problem with rank() and dense_rank()


Copy link to this message
-
Re: Problem with rank() and dense_rank()
Harish Butani 2013-08-20, 02:27
Regarding the issue posted with rank and dense_rank. The example posted was:

CREATE TABLE test (a INT);
EXPLAIN
SELECT
  DENSE_RANK() OVER (PARTITION BY a),
  a
FROM test;

Some comments on this:

1. The underlying issue is the bug that ranking functions had to be lowercase.  This was fixed in Jira 4954 by Edward.
     The ranking functions are rank, dense_rank, percent_rank and cume_dist.
     Some details about this: we check for ranking functions and this check was looking for lowercase names.
     More below on why we do this check.
     The rest of hive: parser, functionRegistry etc is not case sensitive. So from a language perspective these
     functions are not case sensitive.

2. The ranking functions don't accept any arguments. They operate on the expressions specified in the order spec
    of the window clause. Internally the ranking functions are implemented as UDAFs, so what we do is pass the
    order expressions as arguments to the ranking functions.  So from a language perspective invoke ranking functions
   without any arguments.

3. Now the e.g. above is saying partition by 'a'; but no order is specified. Since no order is specified for the ranking functions
    the order expression assumed is the partition expressions, in this it is 'a'. So the output of this e.g. would be all rows would
    have a dense_rank = 1.

4. The explanation for this case: DENSE_RANK() OVER (PARTITION BY a)
you got:
FAILED: SemanticException Failed to breakup Windowing invocations into
Groups. At least 1 group must only depend on input columns. Also check
for circular dependencies.
Underlying error:
org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more
arguments are expected.

The problem is DENSE_RANK didn't get flagged as a ranking function because of the lowercase bug.
This had the effect that we add the order expressions as arguments.
It got compiled into a dense_rank UDAF invocation. Which is than expecting at least 1 arg.

5. The explanation for this case: dense_rank(a) OVER (PARTITION BY a)

this is the right behavior.
But I can see the confusion. The error message is not helpful. We should say something to the effect that ranking functions work of order expressions.
Adding a Jira for this.

6. What is is 'Failed to breakup Windowing invocations into
Groups. At least 1 group must only depend on input columns. Also check
for circular dependencies.'

We allow for multiple window invocations in a Query. These may be on different partition and order specifications.
Internally we go through a process of breaking up Window invocations into Groups and do translation on each Group.
Again in this case, we should provide better error messages. This message is confusing.
When there is only 1 Group, we should just output the underlying message.
Adding a Jira for this.
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.