Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Something wrong with my query to get TOP 3?


Copy link to this message
-
Re: Something wrong with my query to get TOP 3?
Remove pid,time from DISTRIBUTE BY.

On Thu, Jul 19, 2012 at 1:45 PM, comptech geeky <[EMAIL PROTECTED]>wrote:

> Modified Query that I wrote and its not working as expected output is.
>
> *
> *
> *SELECT bid, pid, rank(bid), time, UNIX_TIMESTAMP(time)*
> *FROM (*
> *    SELECT bid, pid, time*
> *    FROM table1*
> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(time) as int))) > '2012-07-09'*
> *    DISTRIBUTE BY bid,pid,time*
> *    SORT BY bid, time desc*
> *) a*
> *WHERE rank(bid) < 3;*
>
>
>
>
>
> On Thu, Jul 19, 2012 at 1:41 PM, comptech geeky <[EMAIL PROTECTED]>wrote:
>
>> I wrote this query after modifying it-
>>
>> *SELECT buyer_id, item_id, rank(buyer_id), created_time,
>> UNIX_TIMESTAMP(created_time)*
>> *FROM (*
>> *    SELECT buyer_id, item_id, created_time*
>> *    FROM testingtable1*
>> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(created_time) as
>> int))) = '2012-07-09'*
>> *    DISTRIBUTE BY buyer_id,item_id*
>> *    SORT BY buyer_id, created_time desc*
>> *) a*
>> *WHERE rank(buyer_id) < 3;*
>>
>> And the output I got is which is sligtly wrong as it is missing two rows-
>>
>> *1345653    330760137950       2012-07-09 21:43:29*
>> *1345653    330760137950       2012-07-09 21:42:29*
>> *1345653    330760137950       2012-07-09 21:41:29*
>>
>> These two rows are missing-
>>
>> 1345653    110909316904       2012-07-09 21:29:06
>> 1345653    221065796761       2012-07-09 19:31:48
>>
>>  So full output should be like this-
>>
>> *1345653    330760137950       2012-07-09 21:43:29*
>> *1345653    330760137950       2012-07-09 21:42:29*
>> *1345653    330760137950       2012-07-09 21:41:29*
>> 1345653    110909316904       2012-07-09 21:29:06
>> 1345653    221065796761       2012-07-09 19:31:48
>>
>>
>> On Thu, Jul 19, 2012 at 1:29 PM, comptech geeky <[EMAIL PROTECTED]>wrote:
>>
>>> Can you show me the exact query that I need to do for this particular
>>> problem consideing my scenario? It will be of great help to me. As I am new
>>> to HiveQL.
>>>
>>> I need TOP 3 for those if BID and PID gets matched but with different
>>> timestamp.
>>>
>>>
>>>
>>> On Thu, Jul 19, 2012 at 1:15 PM, Philip Tromans <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> Your rank() is being evaluated map side. Put your distribute by and
>>>> sort by in an inner query, and then evaluate your rank() in an outer query.
>>>>
>>>> Phil.
>>>> On Jul 19, 2012 9:00 PM, "comptech geeky" <[EMAIL PROTECTED]>
>>>> wrote:
>>>>
>>>>> This is the below data in my Table1
>>>>>
>>>>>
>>>>> BID       PID                       TIME
>>>>>
>>>>> --------------+-------------------------+--------------------------------
>>>>> 1345653   330760137950       2012-07-09 21:42:29
>>>>> 1345653   330760137950       2012-07-09 21:43:29
>>>>> 1345653   330760137950       2012-07-09 21:40:29
>>>>> 1345653   330760137950       2012-07-09 21:41:29
>>>>> 1345653   110909316904       2012-07-09 21:29:06
>>>>> 1345653   221065796761       2012-07-09 19:31:48
>>>>>
>>>>> So If I need to clarify the above scenario- I have data in above table
>>>>> like this-
>>>>> For USER *`1345653` *I have this PID `*330760137950` *four times but
>>>>> with different timestamps in red color. So I need the output something like
>>>>> this-
>>>>>
>>>>> Output that I need:-
>>>>>
>>>>> *1345653    330760137950       2012-07-09 21:43:29 *
>>>>> *1345653    330760137950       2012-07-09 21:42:29 *
>>>>> *1345653    330760137950       2012-07-09 21:41:29*
>>>>> 1345653    110909316904       2012-07-09 21:29:06
>>>>> 1345653    221065796761       2012-07-09 19:31:48
>>>>>
>>>>> So Basically If BID and PID are same but with different timestamps,
>>>>> then I need TOP 3 sorted with TIME in descending order
>>>>>
>>>>> And for this I created rank UDF (User Defined Function). And I wrote
>>>>> the below query but its not working for me. Can anyone help me on this?
>>>>> *
>>>>> *
>>>>> *
>>>>> *
>>>>> *SELECT buyer_id, item_id, created_time*
>>>>> *    FROM table1*
>>>>> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(created_time) as