Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Semantics of Rank.


Copy link to this message
-
Re: Semantics of Rank.
j.barrett Strausser 2013-07-26, 18:37
Any further help on this, otherwise I'll file a jira.
On Wed, Jul 24, 2013 at 11:32 PM, j.barrett Strausser <
[EMAIL PROTECTED]> wrote:

> As an example : If I run my query above removing the arg the following is
> thrown.
>
> FAILED: SemanticException Failed to breakup Windowing invocations into
> Groups. At least 1 group must only depend on input columns. Also check for
> circular dependencies.
> Underlying error: org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException:
> One or more arguments are expected.
>
>
> Similar issue and fix here:
>
> http://www.marshut.com/rqvpz/use-rank-over-partition-function-in-hive-11.html
>
> Even if it didn't require an arg it still doesn't explain my anomalous
> output.
>
>
>
> On Wed, Jul 24, 2013 at 11:28 PM, j.barrett Strausser <
> [EMAIL PROTECTED]> wrote:
>
>> That isn't true. If you try to run the above HIVE without an argument, it
>> will throw an exception. I have seen other users replicate this problem as
>> well.
>>
>> I can file a JIRA if someone can confirm that my query should work.
>>
>>
>> On Wed, Jul 24, 2013 at 11:02 PM, [EMAIL PROTECTED] <
>> [EMAIL PROTECTED]> wrote:
>>
>>> Analytical function doesn't expect any argument. Rank() itself enough to
>>> sequence based on the window you have defined in partition by. So
>>>
>>> Rank() over (partition by cmscustid  order by orderdate)
>>>
>>> Should work as long as I have wrote right syntax for hive.
>>>
>>> Sent via Rocket from my HTC
>>>
>>> ----- Reply message -----
>>> From: "j.barrett Strausser" <[EMAIL PROTECTED]>
>>> To: <[EMAIL PROTECTED]>
>>> Subject: Semantics of Rank.
>>> Date: Thu, Jul 25, 2013 1:08 AM
>>>
>>>
>>> Thanks for the reply. Perhaps my misunderstanding of the relation between
>>> rank and the windowing function is wrong.
>>>
>>> What I want to achieve for the following is : For a given customer id,
>>> sort his orders. I thought the below would work.
>>>
>>> SELECT eh.cmsorderid, eh.orderdate, RANK(orderdate) w FROM order_data eh
>>> window w as (partition by cmscustid  order by orderdate);
>>>
>>> The rank function instead returns the rank of the order date over all all
>>> order dates.
>>>
>>> Example snippet from above
>>>
>>> Actual :
>>>
>>> 6758783    27APR2012    94
>>> 6758783    23JUN2012    95
>>> 6758785    14DEC2012    96
>>> 6758795    18DEC2011    97
>>> 6758796    06MAY2012    98
>>> 6758798    24MAR2013    99
>>> 6758799    23NOV2012    100
>>>
>>>
>>> Expected :
>>>
>>> 6758783    27APR2012    1
>>> 6758783    23JUN2012    2
>>> 6758785    14DEC2012    1
>>> 6758795    18DEC2011    1
>>> 6758796    06MAY2012    1
>>> 6758798    24MAR2013    1
>>> 6758799    23NOV2012    1
>>>
>>>
>>> -b
>>>
>>>
>>>
>>>
>>> On Wed, Jul 24, 2013 at 3:17 PM, Shahar Glixman <[EMAIL PROTECTED]
>>> >wrote:
>>>
>>> > the argument to rank is simply some value, whereas the rank function
>>> > compare this value
>>> >  to the previous value received, if value is same, rank returns
>>> ++index,
>>> > otherwise, rank return 1.
>>> > pseudo code:
>>> >
>>> > class Rank {
>>> > int index;
>>> > Object previousValue = null;
>>> > int evaluate(Object value) {
>>> >   if (value == previousValue) {
>>> >     return ++index;
>>> >   }
>>> >   previousValue = value;
>>> >   index = 1;
>>> >   return 1;
>>> > }
>>> >
>>> >
>>> > On Wed, Jul 24, 2013 at 9:59 PM, j.barrett Strausser <
>>> > [EMAIL PROTECTED]> wrote:
>>> >
>>> >> It seems as though I am required to pass in an argument to RANK().
>>> >>
>>> >> What is the effect of passing this argument in ?
>>> >>
>>> >> In a RANK function the output for RANK should be the number of rows
>>> >> preceding
>>> >> a row in a given window. As  windows are specified by the partition
>>> and
>>> >> order by fields I don't understand the effect of passing in an arg to
>>> RANK
>>> >>
>>> >> Are the other non-aggregate function similar? Meaning : Cume_dist,
>>> Ntile
>>> >> --
>>> >>
>>> >>
>>> >> https://github.com/bearrito

https://github.com/bearrito
@deepbearrito