Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Find TOP 10 using HiveQL


+
Raihan Jamal 2012-07-10, 02:30
+
Andes 2012-07-10, 02:56
+
Raihan Jamal 2012-07-10, 03:06
+
Abhishek Tiwari 2012-07-10, 07:03
+
Raihan Jamal 2012-07-10, 07:08
Copy link to this message
-
Re: Find TOP 10 using HiveQL
i thought you managed to solve this with rank??

On Tue, Jul 10, 2012 at 12:38 PM, Raihan Jamal <[EMAIL PROTECTED]>wrote:

> Problem with that approach is, with LIMIT 10, If I am putting after desc,
> then it will get only 10 rows irrespective of BUYER_ID. But I need
> specifically for each BUYER_ID 10 latest rows.
>
>
>
>
> *Raihan Jamal*
>
>
>
> On Tue, Jul 10, 2012 at 12:03 AM, Abhishek Tiwari <
> [EMAIL PROTECTED]> wrote:
>
>> Raihan,
>>
>> Andes suggests you use 'limit' after 'desc' ie Hive with sort your query
>> results in descending order and then chop off any more than first 10
>> records.
>> However, Hive will still run the full scan on data since it has to find
>> the sorted list of records for you.
>>
>>
>> On Tue, Jul 10, 2012 at 8:36 AM, Raihan Jamal <[EMAIL PROTECTED]>wrote:
>>
>>> This is my below requirement. I need-  *Find `TOP 10` data for each
>>> `BUYER_ID, *So I cannot use LIMIT 10 here in this case.
>>>
>>> This is the below table
>>>
>>>     CREATE TABLE IF NOT EXISTS TestingTable1
>>>     (
>>>     BUYER_ID BIGINT,
>>>     ITEM_ID BIGINT,
>>>     CREATED_TIME STRING
>>>     )
>>>
>>> And this is the below data in the above table-
>>>
>>>     BUYER_ID    |    ITEM_ID   | CREATED_TIME
>>>     ------------+------------------+-----------------------
>>>     1015826235      220003038067        2012-07-09 19:40:21,
>>>     1015826235      300003861266        2012-07-09 18:19:59,
>>>     1015826235      140002997245        2012-07-09 09:23:17,
>>>     1015826235      210002448035        2012-07-09 22:21:11,
>>>     1015826235      260003553381        2012-07-09 07:09:56,
>>>     1015826235      260003553382        2012-07-09 19:40:39,
>>>     1015826235      260003553383        2012-07-09 06:58:47,
>>>     1015826235      260003553384        2012-07-09 07:28:47,
>>>     1015826235      260003553385        2012-07-09 08:48:47,
>>>     1015826235      260003553386        2012-07-09 06:38:47,
>>>     1015826235      260003553387        2012-07-09 05:38:47,
>>>     1015826235      260003553388        2012-07-09 04:55:47,
>>>     1015826235      260003553389        2012-07-09 06:54:37,
>>>     34512201        597245693           2012-07-09 16:20:21,
>>>     34512201        8071787728          2012-07-09 15:19:59,
>>>     34512201        5868222883          2012-07-09 08:23:17,
>>>     34512201        2412180494          2012-07-09 22:21:11,
>>>     34512201        2422054205          2012-07-09 06:09:56,
>>>     34512201        1875744030          2012-07-09 19:40:39,
>>>     34512201        5639158173          2012-07-09 06:58:47,
>>>     34512201        5656232360          2012-07-09 07:28:47,
>>>     34512201        959188449 2012-07-09 08:48:47,
>>>     34512201        4645350592          2012-07-09 06:38:47,
>>>     34512201        5657320532 2012-07-09 05:38:47,
>>>     34512201        290419656539 2012-07-09 04:55:47,
>>>
>>> If you see the above data in the table, there are only two UNIQUE
>>> `BUYER_ID` and corresponding to those I have `ITEM_ID` AND `CREATED_TIME`.
>>> I need only 10 latest record basis on the time for that given day for each
>>> `BUYER_ID`.
>>>
>>> So for this `BUYER_ID` - `34512201` I need 10 latest record basis on
>>> `CREATED_TIME` for that given day only, it means for today's date I need 10
>>> latest record for each `BUYER_ID`.
>>>
>>> And each `BUYER_ID` can have any day's data. But I am specifically
>>> interested for day before  today's data(means yesterday's date always) by
>>> checking at the `CREATED_TIME`
>>>
>>> **Find `TOP 10` data for each `BUYER_ID`. Below is the sample output.**
>>>
>>> Sample Output.
>>>
>>>     BUYER_ID    |    ITEM_ID   | CREATED_TIME
>>>     ------------+------------------+-----------------------
>>>     34512201        2412180494          2012-07-09 22:21:11
>>>     34512201        1875744030          2012-07-09 19:40:39
>>>     34512201        597245693           2012-07-09 16:20:21
>>>     34512201        8071787728          2012-07-09 15:19:59
Nitin Pawar
+
Raihan Jamal 2012-07-10, 07:15
+
Wouter de Bie 2012-07-10, 07:24
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB