Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Invalid Function rank in HiveQL


Copy link to this message
-
Re: Invalid Function rank in HiveQL
Raihan Jamal 2012-07-10, 19:31
Still it's not working with the use of my rank UDF. Below is the query I am
using

Can anyone help me, what changes I need to make in my below sql query?

CREATE TABLE IF NOT EXISTS TestingTable1
(

BUYER_ID BIGINT,

ITEM_ID BIGINT,

CREATED_TIME STRING
)
*Find TOP 10 latest data (ITEM_ID, CREATED_TIME) for each BUYER_ID for
yesterday's date by sorting the created_time in descending order.*
So what I was thinking is that, with the use of this below query, I will be
getting-

SELECT * FROM TestingTable1 WHERE ORDER BY buyer_id, created_time DESC;
All the BUYER_ID and its ITEM_ID corresponding to CREATED_TIME in
Descending order, But how I can pick TOP 10 for each BUYER_ID.?

This is my SQL Query that I am using with the use of rank UDF.

SELECT buyer_id, item_id, created_time, rk
FROM (
    SELECT buyer_id, item_id, rank(item_id) as rk, created_time
    FROM testingtable1
    DISTRIBUTE BY buyer_id, item_id
    SORT BY buyer_id, item_id, created_time desc
) a
WHERE rk < 10
ORDER BY buyer_id, created_time, rk;

*Raihan Jamal*

On Tue, Jul 10, 2012 at 12:16 AM, Jasper Knulst
<[EMAIL PROTECTED]>wrote:

> Hi Raihan,
>
> You should use 'rank(buyer_id)' in the order by clause on line 9 in stead
> of the alias 'rk'. I had the same problem, strangely, the alias is not
> resolved when it is in the order by clause.
>
> Other thing, I had some issues when I used this exact same set up for
> ranking results, that the rank UDF was already implemented at the map
> phase. Then you get very strange results. You have to introduce an extra
> subquery for it to work.
>
> Jasper
> Op 10 jul. 2012 09:01 schreef "Raihan Jamal" <[EMAIL PROTECTED]> het
> volgende:
>
>  I was not able to understand, This is my below qiuery that I am using
>> currently-
>>
>> SELECT buyer_id, item_id, created_time
>> FROM (
>>     SELECT buyer_id, item_id, rank(buyer_id) as rank, created_time
>>     FROM testingtable1
>>     DISTRIBUTE BY buyer_id, item_id
>>     SORT BY buyer_id, item_id, created_time desc
>> ) a
>> WHERE rank < 10
>> ORDER BY buyer_id, created_time, rank;
>>
>>
>> What changes I need to make?
>>
>>
>>
>>
>> *Raihan Jamal*
>>
>>
>>
>> On Mon, Jul 9, 2012 at 11:52 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>>
>>> try rk in upper select statement as well
>>>
>>>
>>> On Tue, Jul 10, 2012 at 12:12 PM, Raihan Jamal <[EMAIL PROTECTED]>wrote:
>>>
>>>> Thanks for commenting out. Yes I figured that out, its a UDF. So now I
>>>> have created a new UDF Rank and added to classpath also. But when I am
>>>> again running the below query-
>>>>
>>>> SELECT buyer_id, item_id, created_time
>>>> FROM (
>>>>     SELECT buyer_id, item_id, Rank(buyer_id) as rk, created_time
>>>>     FROM testingtable1
>>>>     DISTRIBUTE BY buyer_id, item_id
>>>>     SORT BY buyer_id, item_id, created_time desc
>>>> ) a
>>>> WHERE rk < 10
>>>> ORDER BY buyer_id, rk;
>>>>
>>>> I am getting this below error-
>>>>
>>>> *FAILED: Error in semantic analysis: line 9:19 Invalid Table Alias or
>>>> Column Reference rk*
>>>>
>>>>
>>>> Why is it so? Any suggestions?
>>>>
>>>>
>>>> *Raihan Jamal*
>>>>
>>>>
>>>>
>>>> On Mon, Jul 9, 2012 at 10:51 PM, Vijay <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> hive has no built-in rank function. you'd need to use a user-defined
>>>>> function (UDF) to simulate it. there are a few custom implementations
>>>>> on the net that you can leverage.
>>>>>
>>>>> On Mon, Jul 9, 2012 at 10:40 PM, Raihan Jamal <[EMAIL PROTECTED]>
>>>>> wrote:
>>>>> > What's wrong with the below query.
>>>>> >
>>>>> >
>>>>> > SELECT buyer_id, item_id, created_time
>>>>> > FROM (
>>>>> >     SELECT buyer_id, item_id, rank(buyer_id) as rank, created_time
>>>>> >     FROM testingtable1
>>>>> >     DISTRIBUTE BY buyer_id, item_id
>>>>> >     SORT BY buyer_id, item_id, created_time desc
>>>>> > ) a
>>>>> > WHERE rank < 10
>>>>> > ORDER BY buyer_id, rank
>>>>> >
>>>>> >
>>>>> > I am always getting as -
>>>>> >
>>>>> > FAILED: Error in semantic analysis: line 3:30 Invalid Function rank