Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Invalid Function rank in HiveQL


Copy link to this message
-
Re: Invalid Function rank in HiveQL
Still it's not working with the use of my rank UDF. Below is the query I am
using

Can anyone help me, what changes I need to make in my below sql query?

CREATE TABLE IF NOT EXISTS TestingTable1
(

BUYER_ID BIGINT,

ITEM_ID BIGINT,

CREATED_TIME STRING
)
*Find TOP 10 latest data (ITEM_ID, CREATED_TIME) for each BUYER_ID for
yesterday's date by sorting the created_time in descending order.*
So what I was thinking is that, with the use of this below query, I will be
getting-

SELECT * FROM TestingTable1 WHERE ORDER BY buyer_id, created_time DESC;
All the BUYER_ID and its ITEM_ID corresponding to CREATED_TIME in
Descending order, But how I can pick TOP 10 for each BUYER_ID.?

This is my SQL Query that I am using with the use of rank UDF.

SELECT buyer_id, item_id, created_time, rk
FROM (
    SELECT buyer_id, item_id, rank(item_id) as rk, created_time
    FROM testingtable1
    DISTRIBUTE BY buyer_id, item_id
    SORT BY buyer_id, item_id, created_time desc
) a
WHERE rk < 10
ORDER BY buyer_id, created_time, rk;

*Raihan Jamal*

On Tue, Jul 10, 2012 at 12:16 AM, Jasper Knulst
<[EMAIL PROTECTED]>wrote:

> Hi Raihan,
>
> You should use 'rank(buyer_id)' in the order by clause on line 9 in stead
> of the alias 'rk'. I had the same problem, strangely, the alias is not
> resolved when it is in the order by clause.
>
> Other thing, I had some issues when I used this exact same set up for
> ranking results, that the rank UDF was already implemented at the map
> phase. Then you get very strange results. You have to introduce an extra
> subquery for it to work.
>
> Jasper
> Op 10 jul. 2012 09:01 schreef "Raihan Jamal" <[EMAIL PROTECTED]> het
> volgende:
>
>  I was not able to understand, This is my below qiuery that I am using
>> currently-
>>
>> SELECT buyer_id, item_id, created_time
>> FROM (
>>     SELECT buyer_id, item_id, rank(buyer_id) as rank, created_time
>>     FROM testingtable1
>>     DISTRIBUTE BY buyer_id, item_id
>>     SORT BY buyer_id, item_id, created_time desc
>> ) a
>> WHERE rank < 10
>> ORDER BY buyer_id, created_time, rank;
>>
>>
>> What changes I need to make?
>>
>>
>>
>>
>> *Raihan Jamal*
>>
>>
>>
>> On Mon, Jul 9, 2012 at 11:52 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>>
>>> try rk in upper select statement as well
>>>
>>>
>>> On Tue, Jul 10, 2012 at 12:12 PM, Raihan Jamal <[EMAIL PROTECTED]>wrote:
>>>
>>>> Thanks for commenting out. Yes I figured that out, its a UDF. So now I
>>>> have created a new UDF Rank and added to classpath also. But when I am
>>>> again running the below query-
>>>>
>>>> SELECT buyer_id, item_id, created_time
>>>> FROM (
>>>>     SELECT buyer_id, item_id, Rank(buyer_id) as rk, created_time
>>>>     FROM testingtable1
>>>>     DISTRIBUTE BY buyer_id, item_id
>>>>     SORT BY buyer_id, item_id, created_time desc
>>>> ) a
>>>> WHERE rk < 10
>>>> ORDER BY buyer_id, rk;
>>>>
>>>> I am getting this below error-
>>>>
>>>> *FAILED: Error in semantic analysis: line 9:19 Invalid Table Alias or
>>>> Column Reference rk*
>>>>
>>>>
>>>> Why is it so? Any suggestions?
>>>>
>>>>
>>>> *Raihan Jamal*
>>>>
>>>>
>>>>
>>>> On Mon, Jul 9, 2012 at 10:51 PM, Vijay <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> hive has no built-in rank function. you'd need to use a user-defined
>>>>> function (UDF) to simulate it. there are a few custom implementations
>>>>> on the net that you can leverage.
>>>>>
>>>>> On Mon, Jul 9, 2012 at 10:40 PM, Raihan Jamal <[EMAIL PROTECTED]>
>>>>> wrote:
>>>>> > What's wrong with the below query.
>>>>> >
>>>>> >
>>>>> > SELECT buyer_id, item_id, created_time
>>>>> > FROM (
>>>>> >     SELECT buyer_id, item_id, rank(buyer_id) as rank, created_time
>>>>> >     FROM testingtable1
>>>>> >     DISTRIBUTE BY buyer_id, item_id
>>>>> >     SORT BY buyer_id, item_id, created_time desc
>>>>> > ) a
>>>>> > WHERE rank < 10
>>>>> > ORDER BY buyer_id, rank
>>>>> >
>>>>> >
>>>>> > I am always getting as -
>>>>> >
>>>>> > FAILED: Error in semantic analysis: line 3:30 Invalid Function rank
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB