Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Semantics of Rank.


Copy link to this message
-
Re: Semantics of Rank.
Any further help on this, otherwise I'll file a jira.
On Wed, Jul 24, 2013 at 11:32 PM, j.barrett Strausser <
[EMAIL PROTECTED]> wrote:

> As an example : If I run my query above removing the arg the following is
> thrown.
>
> FAILED: SemanticException Failed to breakup Windowing invocations into
> Groups. At least 1 group must only depend on input columns. Also check for
> circular dependencies.
> Underlying error: org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException:
> One or more arguments are expected.
>
>
> Similar issue and fix here:
>
> http://www.marshut.com/rqvpz/use-rank-over-partition-function-in-hive-11.html
>
> Even if it didn't require an arg it still doesn't explain my anomalous
> output.
>
>
>
> On Wed, Jul 24, 2013 at 11:28 PM, j.barrett Strausser <
> [EMAIL PROTECTED]> wrote:
>
>> That isn't true. If you try to run the above HIVE without an argument, it
>> will throw an exception. I have seen other users replicate this problem as
>> well.
>>
>> I can file a JIRA if someone can confirm that my query should work.
>>
>>
>> On Wed, Jul 24, 2013 at 11:02 PM, [EMAIL PROTECTED] <
>> [EMAIL PROTECTED]> wrote:
>>
>>> Analytical function doesn't expect any argument. Rank() itself enough to
>>> sequence based on the window you have defined in partition by. So
>>>
>>> Rank() over (partition by cmscustid  order by orderdate)
>>>
>>> Should work as long as I have wrote right syntax for hive.
>>>
>>> Sent via Rocket from my HTC
>>>
>>> ----- Reply message -----
>>> From: "j.barrett Strausser" <[EMAIL PROTECTED]>
>>> To: <[EMAIL PROTECTED]>
>>> Subject: Semantics of Rank.
>>> Date: Thu, Jul 25, 2013 1:08 AM
>>>
>>>
>>> Thanks for the reply. Perhaps my misunderstanding of the relation between
>>> rank and the windowing function is wrong.
>>>
>>> What I want to achieve for the following is : For a given customer id,
>>> sort his orders. I thought the below would work.
>>>
>>> SELECT eh.cmsorderid, eh.orderdate, RANK(orderdate) w FROM order_data eh
>>> window w as (partition by cmscustid  order by orderdate);
>>>
>>> The rank function instead returns the rank of the order date over all all
>>> order dates.
>>>
>>> Example snippet from above
>>>
>>> Actual :
>>>
>>> 6758783    27APR2012    94
>>> 6758783    23JUN2012    95
>>> 6758785    14DEC2012    96
>>> 6758795    18DEC2011    97
>>> 6758796    06MAY2012    98
>>> 6758798    24MAR2013    99
>>> 6758799    23NOV2012    100
>>>
>>>
>>> Expected :
>>>
>>> 6758783    27APR2012    1
>>> 6758783    23JUN2012    2
>>> 6758785    14DEC2012    1
>>> 6758795    18DEC2011    1
>>> 6758796    06MAY2012    1
>>> 6758798    24MAR2013    1
>>> 6758799    23NOV2012    1
>>>
>>>
>>> -b
>>>
>>>
>>>
>>>
>>> On Wed, Jul 24, 2013 at 3:17 PM, Shahar Glixman <[EMAIL PROTECTED]
>>> >wrote:
>>>
>>> > the argument to rank is simply some value, whereas the rank function
>>> > compare this value
>>> >  to the previous value received, if value is same, rank returns
>>> ++index,
>>> > otherwise, rank return 1.
>>> > pseudo code:
>>> >
>>> > class Rank {
>>> > int index;
>>> > Object previousValue = null;
>>> > int evaluate(Object value) {
>>> >   if (value == previousValue) {
>>> >     return ++index;
>>> >   }
>>> >   previousValue = value;
>>> >   index = 1;
>>> >   return 1;
>>> > }
>>> >
>>> >
>>> > On Wed, Jul 24, 2013 at 9:59 PM, j.barrett Strausser <
>>> > [EMAIL PROTECTED]> wrote:
>>> >
>>> >> It seems as though I am required to pass in an argument to RANK().
>>> >>
>>> >> What is the effect of passing this argument in ?
>>> >>
>>> >> In a RANK function the output for RANK should be the number of rows
>>> >> preceding
>>> >> a row in a given window. As  windows are specified by the partition
>>> and
>>> >> order by fields I don't understand the effect of passing in an arg to
>>> RANK
>>> >>
>>> >> Are the other non-aggregate function similar? Meaning : Cume_dist,
>>> Ntile
>>> >> --
>>> >>
>>> >>
>>> >> https://github.com/bearrito

https://github.com/bearrito
@deepbearrito
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB