Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Something wrong with my query to get TOP 3?


Copy link to this message
-
Re: Something wrong with my query to get TOP 3?
Actually, never mind. Looks like you need to partition by both bid and pid.
In that case, your problem is that rank() has to handle a combined bid+pid
key. So first you need to create a combined key, partition by that key and
pass it to your rank() function (assuming rank() knows to reset on a new
key). You can cast bid and pid to string and concatenate them with a
separator (bid_pid) to get a single partitioning key. Hope this makes sense.

On Thu, Jul 19, 2012 at 3:57 PM, Igor Tatarinov <[EMAIL PROTECTED]> wrote:

> Sorry, just pid needs to be dropped from both DISTRIBUTE and SORT clauses.
> Your very first query was correct except for the nested subquery part.
> (You don't need a double-nested subquery.)
>
> On Thu, Jul 19, 2012 at 3:48 PM, comptech geeky <[EMAIL PROTECTED]>wrote:
>
>> Hi Igor,
>>
>> I am not sure what I have to remove from Distribute By as in distribute
>> by we have bid, pid and you said remove bid and time from distribute by and
>> it doesn't have time
>>
>> *SELECT bid, pid, rank FROM *
>>       *(SELECT bid, pid, rank(bid) rank, time, UNIX_TIMESTAMP(time)
>> FROM
>> *
>>   *
>>            ( SELECT bid, pid, time FROM table1*
>> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(time) as int))) >> '2012-07-09' *
>> *            DISTRIBUTE BY bid,pid*
>> *            SORT BY bid,pid, time desc) A
>> *
>> *
>>       ) B
>> *
>> *WHERE rank < 3;*
>>
>>
>> And also I tried running the above query as it is. I am not getting
>> expected output instead of that I am getting output like this which is
>> wrong If you compare my expected output with the below output-
>>
>>  *1345653 110909316904    0*
>> *1345653 221065796761    1*
>> *1345653 330760137950    2*
>>
>>
>> On Thu, Jul 19, 2012 at 3:43 PM, Igor Tatarinov <[EMAIL PROTECTED]> wrote:
>>
>>> Remove pid,time from DISTRIBUTE BY.
>>>
>>> On Thu, Jul 19, 2012 at 1:45 PM, comptech geeky <[EMAIL PROTECTED]
>>> > wrote:
>>>
>>>> Modified Query that I wrote and its not working as expected output is.
>>>>
>>>> *
>>>> *
>>>> *SELECT bid, pid, rank(bid), time, UNIX_TIMESTAMP(time)*
>>>> *FROM (*
>>>> *    SELECT bid, pid, time*
>>>> *    FROM table1*
>>>> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(time) as int))) >>>> '2012-07-09'*
>>>> *    DISTRIBUTE BY bid,pid,time*
>>>> *    SORT BY bid, time desc*
>>>> *) a*
>>>> *WHERE rank(bid) < 3;*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jul 19, 2012 at 1:41 PM, comptech geeky <
>>>> [EMAIL PROTECTED]> wrote:
>>>>
>>>>> I wrote this query after modifying it-
>>>>>
>>>>> *SELECT buyer_id, item_id, rank(buyer_id), created_time,
>>>>> UNIX_TIMESTAMP(created_time)*
>>>>> *FROM (*
>>>>> *    SELECT buyer_id, item_id, created_time*
>>>>> *    FROM testingtable1*
>>>>> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(created_time) as
>>>>> int))) = '2012-07-09'*
>>>>> *    DISTRIBUTE BY buyer_id,item_id*
>>>>> *    SORT BY buyer_id, created_time desc*
>>>>> *) a*
>>>>> *WHERE rank(buyer_id) < 3;*
>>>>>
>>>>> And the output I got is which is sligtly wrong as it is missing two
>>>>> rows-
>>>>>
>>>>> *1345653    330760137950       2012-07-09 21:43:29*
>>>>> *1345653    330760137950       2012-07-09 21:42:29*
>>>>> *1345653    330760137950       2012-07-09 21:41:29*
>>>>>
>>>>> These two rows are missing-
>>>>>
>>>>> 1345653    110909316904       2012-07-09 21:29:06
>>>>> 1345653    221065796761       2012-07-09 19:31:48
>>>>>
>>>>>  So full output should be like this-
>>>>>
>>>>> *1345653    330760137950       2012-07-09 21:43:29*
>>>>> *1345653    330760137950       2012-07-09 21:42:29*
>>>>> *1345653    330760137950       2012-07-09 21:41:29*
>>>>> 1345653    110909316904       2012-07-09 21:29:06
>>>>> 1345653    221065796761       2012-07-09 19:31:48
>>>>>
>>>>>
>>>>> On Thu, Jul 19, 2012 at 1:29 PM, comptech geeky <
>>>>> [EMAIL PROTECTED]> wrote:
>>>>>
>>>>>> Can you show me the exact query that I need to do for this particular
>>>>>> problem consideing my scenario? It will be of great help to me. As I am new
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB