|
|
-
Invalid Function rank in HiveQL
Raihan Jamal 2012-07-10, 05:40
What's wrong with the below query. SELECT buyer_id, item_id, created_time FROM ( SELECT buyer_id, item_id, rank(buyer_id) as rank, created_time FROM testingtable1 DISTRIBUTE BY buyer_id, item_id SORT BY buyer_id, item_id, created_time desc ) a WHERE rank < 10 ORDER BY buyer_id, rank I am always getting as -
*FAILED: Error in semantic analysis: line 3:30 Invalid Function rank* * *
I am using Hive 0.6 I guess.
*Raihan Jamal*
+
Raihan Jamal 2012-07-10, 05:40
-
Re: Invalid Function rank in HiveQL
Vijay 2012-07-10, 05:51
hive has no built-in rank function. you'd need to use a user-defined function (UDF) to simulate it. there are a few custom implementations on the net that you can leverage.
On Mon, Jul 9, 2012 at 10:40 PM, Raihan Jamal <[EMAIL PROTECTED]> wrote: > What's wrong with the below query. > > > SELECT buyer_id, item_id, created_time > FROM ( > SELECT buyer_id, item_id, rank(buyer_id) as rank, created_time > FROM testingtable1 > DISTRIBUTE BY buyer_id, item_id > SORT BY buyer_id, item_id, created_time desc > ) a > WHERE rank < 10 > ORDER BY buyer_id, rank > > > I am always getting as - > > FAILED: Error in semantic analysis: line 3:30 Invalid Function rank > > > I am using Hive 0.6 I guess. > > > > Raihan Jamal >
+
Vijay 2012-07-10, 05:51
-
Re: Invalid Function rank in HiveQL
Raihan Jamal 2012-07-10, 06:42
Thanks for commenting out. Yes I figured that out, its a UDF. So now I have created a new UDF Rank and added to classpath also. But when I am again running the below query-
SELECT buyer_id, item_id, created_time FROM ( SELECT buyer_id, item_id, Rank(buyer_id) as rk, created_time FROM testingtable1 DISTRIBUTE BY buyer_id, item_id SORT BY buyer_id, item_id, created_time desc ) a WHERE rk < 10 ORDER BY buyer_id, rk;
I am getting this below error-
*FAILED: Error in semantic analysis: line 9:19 Invalid Table Alias or Column Reference rk* Why is it so? Any suggestions? *Raihan Jamal*
On Mon, Jul 9, 2012 at 10:51 PM, Vijay <[EMAIL PROTECTED]> wrote:
> hive has no built-in rank function. you'd need to use a user-defined > function (UDF) to simulate it. there are a few custom implementations > on the net that you can leverage. > > On Mon, Jul 9, 2012 at 10:40 PM, Raihan Jamal <[EMAIL PROTECTED]> > wrote: > > What's wrong with the below query. > > > > > > SELECT buyer_id, item_id, created_time > > FROM ( > > SELECT buyer_id, item_id, rank(buyer_id) as rank, created_time > > FROM testingtable1 > > DISTRIBUTE BY buyer_id, item_id > > SORT BY buyer_id, item_id, created_time desc > > ) a > > WHERE rank < 10 > > ORDER BY buyer_id, rank > > > > > > I am always getting as - > > > > FAILED: Error in semantic analysis: line 3:30 Invalid Function rank > > > > > > I am using Hive 0.6 I guess. > > > > > > > > Raihan Jamal > > >
+
Raihan Jamal 2012-07-10, 06:42
-
Re: Invalid Function rank in HiveQL
Nitin Pawar 2012-07-10, 06:52
try rk in upper select statement as well On Tue, Jul 10, 2012 at 12:12 PM, Raihan Jamal <[EMAIL PROTECTED]>wrote:
> Thanks for commenting out. Yes I figured that out, its a UDF. So now I > have created a new UDF Rank and added to classpath also. But when I am > again running the below query- > > SELECT buyer_id, item_id, created_time > FROM ( > SELECT buyer_id, item_id, Rank(buyer_id) as rk, created_time > FROM testingtable1 > DISTRIBUTE BY buyer_id, item_id > SORT BY buyer_id, item_id, created_time desc > ) a > WHERE rk < 10 > ORDER BY buyer_id, rk; > > I am getting this below error- > > *FAILED: Error in semantic analysis: line 9:19 Invalid Table Alias or > Column Reference rk* > > > Why is it so? Any suggestions? > > > *Raihan Jamal* > > > > On Mon, Jul 9, 2012 at 10:51 PM, Vijay <[EMAIL PROTECTED]> wrote: > >> hive has no built-in rank function. you'd need to use a user-defined >> function (UDF) to simulate it. there are a few custom implementations >> on the net that you can leverage. >> >> On Mon, Jul 9, 2012 at 10:40 PM, Raihan Jamal <[EMAIL PROTECTED]> >> wrote: >> > What's wrong with the below query. >> > >> > >> > SELECT buyer_id, item_id, created_time >> > FROM ( >> > SELECT buyer_id, item_id, rank(buyer_id) as rank, created_time >> > FROM testingtable1 >> > DISTRIBUTE BY buyer_id, item_id >> > SORT BY buyer_id, item_id, created_time desc >> > ) a >> > WHERE rank < 10 >> > ORDER BY buyer_id, rank >> > >> > >> > I am always getting as - >> > >> > FAILED: Error in semantic analysis: line 3:30 Invalid Function rank >> > >> > >> > I am using Hive 0.6 I guess. >> > >> > >> > >> > Raihan Jamal >> > >> > > -- Nitin Pawar
+
Nitin Pawar 2012-07-10, 06:52
-
Re: Invalid Function rank in HiveQL
Raihan Jamal 2012-07-10, 07:00
I was not able to understand, This is my below qiuery that I am using currently-
SELECT buyer_id, item_id, created_time FROM ( SELECT buyer_id, item_id, rank(buyer_id) as rank, created_time FROM testingtable1 DISTRIBUTE BY buyer_id, item_id SORT BY buyer_id, item_id, created_time desc ) a WHERE rank < 10 ORDER BY buyer_id, created_time, rank; What changes I need to make? *Raihan Jamal*
On Mon, Jul 9, 2012 at 11:52 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
> try rk in upper select statement as well > > > On Tue, Jul 10, 2012 at 12:12 PM, Raihan Jamal <[EMAIL PROTECTED]>wrote: > >> Thanks for commenting out. Yes I figured that out, its a UDF. So now I >> have created a new UDF Rank and added to classpath also. But when I am >> again running the below query- >> >> SELECT buyer_id, item_id, created_time >> FROM ( >> SELECT buyer_id, item_id, Rank(buyer_id) as rk, created_time >> FROM testingtable1 >> DISTRIBUTE BY buyer_id, item_id >> SORT BY buyer_id, item_id, created_time desc >> ) a >> WHERE rk < 10 >> ORDER BY buyer_id, rk; >> >> I am getting this below error- >> >> *FAILED: Error in semantic analysis: line 9:19 Invalid Table Alias or >> Column Reference rk* >> >> >> Why is it so? Any suggestions? >> >> >> *Raihan Jamal* >> >> >> >> On Mon, Jul 9, 2012 at 10:51 PM, Vijay <[EMAIL PROTECTED]> wrote: >> >>> hive has no built-in rank function. you'd need to use a user-defined >>> function (UDF) to simulate it. there are a few custom implementations >>> on the net that you can leverage. >>> >>> On Mon, Jul 9, 2012 at 10:40 PM, Raihan Jamal <[EMAIL PROTECTED]> >>> wrote: >>> > What's wrong with the below query. >>> > >>> > >>> > SELECT buyer_id, item_id, created_time >>> > FROM ( >>> > SELECT buyer_id, item_id, rank(buyer_id) as rank, created_time >>> > FROM testingtable1 >>> > DISTRIBUTE BY buyer_id, item_id >>> > SORT BY buyer_id, item_id, created_time desc >>> > ) a >>> > WHERE rank < 10 >>> > ORDER BY buyer_id, rank >>> > >>> > >>> > I am always getting as - >>> > >>> > FAILED: Error in semantic analysis: line 3:30 Invalid Function rank >>> > >>> > >>> > I am using Hive 0.6 I guess. >>> > >>> > >>> > >>> > Raihan Jamal >>> > >>> >> >> > > > -- > Nitin Pawar > >
+
Raihan Jamal 2012-07-10, 07:00
-
Re: Invalid Function rank in HiveQL
Nitin Pawar 2012-07-10, 07:04
change this " SELECT buyer_id, item_id, created_time" to " SELECT buyer_id, item_id, created_time, rk" On Tue, Jul 10, 2012 at 12:30 PM, Raihan Jamal <[EMAIL PROTECTED]>wrote:
> I was not able to understand, This is my below qiuery that I am using > currently- > > SELECT buyer_id, item_id, created_time > FROM ( > SELECT buyer_id, item_id, rank(buyer_id) as rank, created_time > FROM testingtable1 > DISTRIBUTE BY buyer_id, item_id > SORT BY buyer_id, item_id, created_time desc > ) a > WHERE rank < 10 > ORDER BY buyer_id, created_time, rank; > > > What changes I need to make? > > > > > *Raihan Jamal* > > > > On Mon, Jul 9, 2012 at 11:52 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote: > >> try rk in upper select statement as well >> >> >> On Tue, Jul 10, 2012 at 12:12 PM, Raihan Jamal <[EMAIL PROTECTED]>wrote: >> >>> Thanks for commenting out. Yes I figured that out, its a UDF. So now I >>> have created a new UDF Rank and added to classpath also. But when I am >>> again running the below query- >>> >>> SELECT buyer_id, item_id, created_time >>> FROM ( >>> SELECT buyer_id, item_id, Rank(buyer_id) as rk, created_time >>> FROM testingtable1 >>> DISTRIBUTE BY buyer_id, item_id >>> SORT BY buyer_id, item_id, created_time desc >>> ) a >>> WHERE rk < 10 >>> ORDER BY buyer_id, rk; >>> >>> I am getting this below error- >>> >>> *FAILED: Error in semantic analysis: line 9:19 Invalid Table Alias or >>> Column Reference rk* >>> >>> >>> Why is it so? Any suggestions? >>> >>> >>> *Raihan Jamal* >>> >>> >>> >>> On Mon, Jul 9, 2012 at 10:51 PM, Vijay <[EMAIL PROTECTED]> wrote: >>> >>>> hive has no built-in rank function. you'd need to use a user-defined >>>> function (UDF) to simulate it. there are a few custom implementations >>>> on the net that you can leverage. >>>> >>>> On Mon, Jul 9, 2012 at 10:40 PM, Raihan Jamal <[EMAIL PROTECTED]> >>>> wrote: >>>> > What's wrong with the below query. >>>> > >>>> > >>>> > SELECT buyer_id, item_id, created_time >>>> > FROM ( >>>> > SELECT buyer_id, item_id, rank(buyer_id) as rank, created_time >>>> > FROM testingtable1 >>>> > DISTRIBUTE BY buyer_id, item_id >>>> > SORT BY buyer_id, item_id, created_time desc >>>> > ) a >>>> > WHERE rank < 10 >>>> > ORDER BY buyer_id, rank >>>> > >>>> > >>>> > I am always getting as - >>>> > >>>> > FAILED: Error in semantic analysis: line 3:30 Invalid Function rank >>>> > >>>> > >>>> > I am using Hive 0.6 I guess. >>>> > >>>> > >>>> > >>>> > Raihan Jamal >>>> > >>>> >>> >>> >> >> >> -- >> Nitin Pawar >> >> > -- Nitin Pawar
+
Nitin Pawar 2012-07-10, 07:04
-
Re: Invalid Function rank in HiveQL
Jasper Knulst 2012-07-10, 07:16
Hi Raihan,
You should use 'rank(buyer_id)' in the order by clause on line 9 in stead of the alias 'rk'. I had the same problem, strangely, the alias is not resolved when it is in the order by clause.
Other thing, I had some issues when I used this exact same set up for ranking results, that the rank UDF was already implemented at the map phase. Then you get very strange results. You have to introduce an extra subquery for it to work.
Jasper Op 10 jul. 2012 09:01 schreef "Raihan Jamal" <[EMAIL PROTECTED]> het volgende:
> I was not able to understand, This is my below qiuery that I am using > currently- > > SELECT buyer_id, item_id, created_time > FROM ( > SELECT buyer_id, item_id, rank(buyer_id) as rank, created_time > FROM testingtable1 > DISTRIBUTE BY buyer_id, item_id > SORT BY buyer_id, item_id, created_time desc > ) a > WHERE rank < 10 > ORDER BY buyer_id, created_time, rank; > > > What changes I need to make? > > > > > *Raihan Jamal* > > > > On Mon, Jul 9, 2012 at 11:52 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote: > >> try rk in upper select statement as well >> >> >> On Tue, Jul 10, 2012 at 12:12 PM, Raihan Jamal <[EMAIL PROTECTED]>wrote: >> >>> Thanks for commenting out. Yes I figured that out, its a UDF. So now I >>> have created a new UDF Rank and added to classpath also. But when I am >>> again running the below query- >>> >>> SELECT buyer_id, item_id, created_time >>> FROM ( >>> SELECT buyer_id, item_id, Rank(buyer_id) as rk, created_time >>> FROM testingtable1 >>> DISTRIBUTE BY buyer_id, item_id >>> SORT BY buyer_id, item_id, created_time desc >>> ) a >>> WHERE rk < 10 >>> ORDER BY buyer_id, rk; >>> >>> I am getting this below error- >>> >>> *FAILED: Error in semantic analysis: line 9:19 Invalid Table Alias or >>> Column Reference rk* >>> >>> >>> Why is it so? Any suggestions? >>> >>> >>> *Raihan Jamal* >>> >>> >>> >>> On Mon, Jul 9, 2012 at 10:51 PM, Vijay <[EMAIL PROTECTED]> wrote: >>> >>>> hive has no built-in rank function. you'd need to use a user-defined >>>> function (UDF) to simulate it. there are a few custom implementations >>>> on the net that you can leverage. >>>> >>>> On Mon, Jul 9, 2012 at 10:40 PM, Raihan Jamal <[EMAIL PROTECTED]> >>>> wrote: >>>> > What's wrong with the below query. >>>> > >>>> > >>>> > SELECT buyer_id, item_id, created_time >>>> > FROM ( >>>> > SELECT buyer_id, item_id, rank(buyer_id) as rank, created_time >>>> > FROM testingtable1 >>>> > DISTRIBUTE BY buyer_id, item_id >>>> > SORT BY buyer_id, item_id, created_time desc >>>> > ) a >>>> > WHERE rank < 10 >>>> > ORDER BY buyer_id, rank >>>> > >>>> > >>>> > I am always getting as - >>>> > >>>> > FAILED: Error in semantic analysis: line 3:30 Invalid Function rank >>>> > >>>> > >>>> > I am using Hive 0.6 I guess. >>>> > >>>> > >>>> > >>>> > Raihan Jamal >>>> > >>>> >>> >>> >> >> >> -- >> Nitin Pawar >> >> >
+
Jasper Knulst 2012-07-10, 07:16
-
Re: Invalid Function rank in HiveQL
Raihan Jamal 2012-07-10, 19:31
Still it's not working with the use of my rank UDF. Below is the query I am using
Can anyone help me, what changes I need to make in my below sql query?
CREATE TABLE IF NOT EXISTS TestingTable1 (
BUYER_ID BIGINT,
ITEM_ID BIGINT,
CREATED_TIME STRING ) *Find TOP 10 latest data (ITEM_ID, CREATED_TIME) for each BUYER_ID for yesterday's date by sorting the created_time in descending order.* So what I was thinking is that, with the use of this below query, I will be getting-
SELECT * FROM TestingTable1 WHERE ORDER BY buyer_id, created_time DESC; All the BUYER_ID and its ITEM_ID corresponding to CREATED_TIME in Descending order, But how I can pick TOP 10 for each BUYER_ID.?
This is my SQL Query that I am using with the use of rank UDF.
SELECT buyer_id, item_id, created_time, rk FROM ( SELECT buyer_id, item_id, rank(item_id) as rk, created_time FROM testingtable1 DISTRIBUTE BY buyer_id, item_id SORT BY buyer_id, item_id, created_time desc ) a WHERE rk < 10 ORDER BY buyer_id, created_time, rk;
*Raihan Jamal*
On Tue, Jul 10, 2012 at 12:16 AM, Jasper Knulst <[EMAIL PROTECTED]>wrote:
> Hi Raihan, > > You should use 'rank(buyer_id)' in the order by clause on line 9 in stead > of the alias 'rk'. I had the same problem, strangely, the alias is not > resolved when it is in the order by clause. > > Other thing, I had some issues when I used this exact same set up for > ranking results, that the rank UDF was already implemented at the map > phase. Then you get very strange results. You have to introduce an extra > subquery for it to work. > > Jasper > Op 10 jul. 2012 09:01 schreef "Raihan Jamal" <[EMAIL PROTECTED]> het > volgende: > > I was not able to understand, This is my below qiuery that I am using >> currently- >> >> SELECT buyer_id, item_id, created_time >> FROM ( >> SELECT buyer_id, item_id, rank(buyer_id) as rank, created_time >> FROM testingtable1 >> DISTRIBUTE BY buyer_id, item_id >> SORT BY buyer_id, item_id, created_time desc >> ) a >> WHERE rank < 10 >> ORDER BY buyer_id, created_time, rank; >> >> >> What changes I need to make? >> >> >> >> >> *Raihan Jamal* >> >> >> >> On Mon, Jul 9, 2012 at 11:52 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote: >> >>> try rk in upper select statement as well >>> >>> >>> On Tue, Jul 10, 2012 at 12:12 PM, Raihan Jamal <[EMAIL PROTECTED]>wrote: >>> >>>> Thanks for commenting out. Yes I figured that out, its a UDF. So now I >>>> have created a new UDF Rank and added to classpath also. But when I am >>>> again running the below query- >>>> >>>> SELECT buyer_id, item_id, created_time >>>> FROM ( >>>> SELECT buyer_id, item_id, Rank(buyer_id) as rk, created_time >>>> FROM testingtable1 >>>> DISTRIBUTE BY buyer_id, item_id >>>> SORT BY buyer_id, item_id, created_time desc >>>> ) a >>>> WHERE rk < 10 >>>> ORDER BY buyer_id, rk; >>>> >>>> I am getting this below error- >>>> >>>> *FAILED: Error in semantic analysis: line 9:19 Invalid Table Alias or >>>> Column Reference rk* >>>> >>>> >>>> Why is it so? Any suggestions? >>>> >>>> >>>> *Raihan Jamal* >>>> >>>> >>>> >>>> On Mon, Jul 9, 2012 at 10:51 PM, Vijay <[EMAIL PROTECTED]> wrote: >>>> >>>>> hive has no built-in rank function. you'd need to use a user-defined >>>>> function (UDF) to simulate it. there are a few custom implementations >>>>> on the net that you can leverage. >>>>> >>>>> On Mon, Jul 9, 2012 at 10:40 PM, Raihan Jamal <[EMAIL PROTECTED]> >>>>> wrote: >>>>> > What's wrong with the below query. >>>>> > >>>>> > >>>>> > SELECT buyer_id, item_id, created_time >>>>> > FROM ( >>>>> > SELECT buyer_id, item_id, rank(buyer_id) as rank, created_time >>>>> > FROM testingtable1 >>>>> > DISTRIBUTE BY buyer_id, item_id >>>>> > SORT BY buyer_id, item_id, created_time desc >>>>> > ) a >>>>> > WHERE rank < 10 >>>>> > ORDER BY buyer_id, rank >>>>> > >>>>> > >>>>> > I am always getting as - >>>>> > >>>>> > FAILED: Error in semantic analysis: line 3:30 Invalid Function rank
+
Raihan Jamal 2012-07-10, 19:31
-
Re: Invalid Function rank in HiveQL
Vijay 2012-07-10, 23:01
This is a little tricky but this is how it works:
SELECT buyer_id, item_id, rank(item_id), created_time FROM ( SELECT buyer_id, item_id, created_time FROM testingtable1 DISTRIBUTE BY buyer_id, item_id SORT BY buyer_id, item_id, created_time desc ) a WHERE rank(item_id) < 10; On Tue, Jul 10, 2012 at 12:31 PM, Raihan Jamal <[EMAIL PROTECTED]> wrote: > Still it's not working with the use of my rank UDF. Below is the query I am > using > > Can anyone help me, what changes I need to make in my below sql query? > > > CREATE TABLE IF NOT EXISTS TestingTable1 > > > ( > > > BUYER_ID BIGINT, > > > ITEM_ID BIGINT, > > > CREATED_TIME STRING > ) > > > Find TOP 10 latest data (ITEM_ID, CREATED_TIME) for each BUYER_ID for > yesterday's date by sorting the created_time in descending order. > > > So what I was thinking is that, with the use of this below query, I will be > getting- > > > SELECT * FROM TestingTable1 WHERE ORDER BY buyer_id, created_time DESC; > > > All the BUYER_ID and its ITEM_ID corresponding to CREATED_TIME in Descending > order, But how I can pick TOP 10 for each BUYER_ID.? > > This is my SQL Query that I am using with the use of rank UDF. > > SELECT buyer_id, item_id, created_time, rk > FROM ( > SELECT buyer_id, item_id, rank(item_id) as rk, created_time > FROM testingtable1 > DISTRIBUTE BY buyer_id, item_id > SORT BY buyer_id, item_id, created_time desc > ) a > WHERE rk < 10 > ORDER BY buyer_id, created_time, rk; > > > > > > Raihan Jamal > > > > On Tue, Jul 10, 2012 at 12:16 AM, Jasper Knulst <[EMAIL PROTECTED]> > wrote: >> >> Hi Raihan, >> >> You should use 'rank(buyer_id)' in the order by clause on line 9 in stead >> of the alias 'rk'. I had the same problem, strangely, the alias is not >> resolved when it is in the order by clause. >> >> Other thing, I had some issues when I used this exact same set up for >> ranking results, that the rank UDF was already implemented at the map phase. >> Then you get very strange results. You have to introduce an extra subquery >> for it to work. >> >> Jasper >> >> Op 10 jul. 2012 09:01 schreef "Raihan Jamal" <[EMAIL PROTECTED]> het >> volgende: >> >>> I was not able to understand, This is my below qiuery that I am using >>> currently- >>> >>> SELECT buyer_id, item_id, created_time >>> FROM ( >>> SELECT buyer_id, item_id, rank(buyer_id) as rank, created_time >>> FROM testingtable1 >>> DISTRIBUTE BY buyer_id, item_id >>> SORT BY buyer_id, item_id, created_time desc >>> ) a >>> WHERE rank < 10 >>> ORDER BY buyer_id, created_time, rank; >>> >>> >>> What changes I need to make? >>> >>> >>> >>> >>> Raihan Jamal >>> >>> >>> >>> On Mon, Jul 9, 2012 at 11:52 PM, Nitin Pawar <[EMAIL PROTECTED]> >>> wrote: >>>> >>>> try rk in upper select statement as well >>>> >>>> >>>> On Tue, Jul 10, 2012 at 12:12 PM, Raihan Jamal <[EMAIL PROTECTED]> >>>> wrote: >>>>> >>>>> Thanks for commenting out. Yes I figured that out, its a UDF. So now I >>>>> have created a new UDF Rank and added to classpath also. But when I am again >>>>> running the below query- >>>>> >>>>> SELECT buyer_id, item_id, created_time >>>>> FROM ( >>>>> SELECT buyer_id, item_id, Rank(buyer_id) as rk, created_time >>>>> FROM testingtable1 >>>>> DISTRIBUTE BY buyer_id, item_id >>>>> SORT BY buyer_id, item_id, created_time desc >>>>> ) a >>>>> WHERE rk < 10 >>>>> ORDER BY buyer_id, rk; >>>>> >>>>> I am getting this below error- >>>>> >>>>> FAILED: Error in semantic analysis: line 9:19 Invalid Table Alias or >>>>> Column Reference rk >>>>> >>>>> >>>>> Why is it so? Any suggestions? >>>>> >>>>> >>>>> Raihan Jamal >>>>> >>>>> >>>>> >>>>> On Mon, Jul 9, 2012 at 10:51 PM, Vijay <[EMAIL PROTECTED]> wrote: >>>>>> >>>>>> hive has no built-in rank function. you'd need to use a user-defined >>>>>> function (UDF) to simulate it. there are a few custom implementations >>>>>> on the net that you can leverage. >>>>>> >>>>>> On Mon, Jul 9, 2012 at 10:40 PM, Raihan Jamal <[EMAIL PROTECTED]>
+
Vijay 2012-07-10, 23:01
-
Re: Invalid Function rank in HiveQL
Raihan Jamal 2012-07-10, 23:18
Thanks Vijay for reply. But it doesn't works out the way I needed. I am getting full data back for each BUYER_ID. Let me explain you more.
This is the below data in the table and you can see in the below data this BUYER_ID *- 1015826235 *appears 13 times and this BUYER_ID -*34512201 *appears 12 times, And I need TOP 10 for each of the BUYER_ID BASIS on time-
BUYER_ID | ITEM_ID | CREATED_TIME ------------+------------------+----------------------- 1015826235 220003038067 2012-07-09 19:40:21, 1015826235 300003861266 2012-07-09 18:19:59, 1015826235 140002997245 2012-07-09 09:23:17, 1015826235 210002448035 2012-07-09 22:21:11, 1015826235 260003553381 2012-07-09 07:09:56, 1015826235 260003553382 2012-07-09 19:40:39, 1015826235 260003553383 2012-07-09 06:58:47, 1015826235 260003553384 2012-07-09 07:28:47, 1015826235 260003553385 2012-07-09 08:48:47, 1015826235 260003553386 2012-07-09 06:38:47, 1015826235 260003553387 2012-07-09 05:38:47, 1015826235 260003553388 2012-07-09 04:55:47, 1015826235 260003553389 2012-07-09 06:54:37, 34512201 597245693 2012-07-09 16:20:21, 34512201 8071787728 2012-07-09 15:19:59, 34512201 5868222883 2012-07-09 08:23:17, 34512201 2412180494 2012-07-09 22:21:11, 34512201 2422054205 2012-07-09 06:09:56, 34512201 1875744030 2012-07-09 19:40:39, 34512201 5639158173 2012-07-09 06:58:47, 34512201 5656232360 2012-07-09 07:28:47, 34512201 959188449 2012-07-09 08:48:47, 34512201 4645350592 2012-07-09 06:38:47, 34512201 5657320532 2012-07-09 05:38:47, 34512201 290419656539 2012-07-09 04:55:47, So basically if I do it like this-
SELECT * FROM TestingTable1 ORDER BY buyer_id, created_time DESC;
everything will get sorted in descending order for each BUYER_ID and then I need to pick up TOP 10 rows for each BUYER_ID.
And with your query that you just said, it's not working. I am getting full data back like this. Below is the result I got from your query-
34512201 597245693 0 2012-07-09 16:20:21 34512201 959188449 0 2012-07-09 08:48:47 34512201 1875744030 0 2012-07-09 19:40:39 34512201 2412180494 0 2012-07-09 22:21:11 34512201 2422054205 0 2012-07-09 06:09:56 34512201 4645350592 0 2012-07-09 06:38:47 34512201 5639158173 0 2012-07-09 06:58:47 34512201 5656232360 0 2012-07-09 07:28:47 34512201 5657320532 0 2012-07-09 05:38:47 34512201 5868222883 0 2012-07-09 08:23:17 34512201 8071787728 0 2012-07-09 15:19:59 34512201 290419656539 0 2012-07-09 04:55:47 1015826235 140002997245 0 2012-07-09 09:23:17 1015826235 210002448035 0 2012-07-09 22:21:11 1015826235 220003038067 0 2012-07-09 19:40:21 1015826235 260003553381 0 2012-07-09 07:09:56 1015826235 260003553382 0 2012-07-09 19:40:39 1015826235 260003553383 0 2012-07-09 06:58:47 1015826235 260003553384 0 2012-07-09 07:28:47 1015826235 260003553385 0 2012-07-09 08:48:47 1015826235 260003553386 0 2012-07-09 06:38:47 1015826235 260003553387 0 2012-07-09 05:38:47 1015826235 260003553388 0 2012-07-09 04:55:47 1015826235 260003553389 0 2012-07-09 06:54:37 1015826235 300003861266 0 2012-07-09 18:19:59 This is my Rank function that I have created-
package com.example.hive.udf; import org.apache.hadoop.hive.ql.exec.UDF;
public final class Rank extends UDF{ private int counter; private String last_key; public int evaluate(final String key){ if ( !key.equalsIgnoreCase(this.last_key) ) { this.counter = 0; this.last_key = key; } return this.counter++; } } *Raihan Jamal*
On Tue, Jul 10, 2012 at 4:01 PM, Vijay <[EMAIL PROTECTED]> wrote:
+
Raihan Jamal 2012-07-10, 23:18
-
Re: Invalid Function rank in HiveQL
Raihan Jamal 2012-07-10, 23:21
So the OUTPUT should be like this everything sorted in descending order with time and only top 10 for each BUYER_ID-
Sample Output.
BUYER_ID | ITEM_ID | CREATED_TIME ------------+------------------+----------------------- 34512201 2412180494 2012-07-09 22:21:11 34512201 1875744030 2012-07-09 19:40:39 34512201 597245693 2012-07-09 16:20:21 34512201 8071787728 2012-07-09 15:19:59 34512201 959188449 2012-07-09 08:48:47 34512201 5868222883 2012-07-09 08:23:17 34512201 5656232360 2012-07-09 07:28:47 34512201 5639158173 2012-07-09 06:58:47 34512201 4645350592 2012-07-09 06:38:47 34512201 2422054205 2012-07-09 06:09:56 1015826235 210002448035 2012-07-09 22:21:11 1015826235 260003553382 2012-07-09 19:40:39 1015826235 220003038067 2012-07-09 19:40:21 1015826235 300003861266 2012-07-09 18:19:59 1015826235 140002997245 2012-07-09 09:23:17 1015826235 260003553385 2012-07-09 08:48:47 1015826235 260003553384 2012-07-09 07:28:47 1015826235 260003553381 2012-07-09 07:09:56 1015826235 260003553383 2012-07-09 06:58:47 1015826235 260003553389 2012-07-09 06:54:37
*Raihan Jamal*
On Tue, Jul 10, 2012 at 4:18 PM, Raihan Jamal <[EMAIL PROTECTED]> wrote:
> Thanks Vijay for reply. But it doesn't works out the way I needed. I am > getting full data back for each BUYER_ID. Let me explain you more. > > This is the below data in the table and you can see in the below data this > BUYER_ID *- 1015826235 *appears 13 times and this BUYER_ID -*34512201 *appears > 12 times, And I need TOP 10 for each of the BUYER_ID BASIS on time- > > BUYER_ID | ITEM_ID | CREATED_TIME > > ------------+------------------+----------------------- > > 1015826235 220003038067 2012-07-09 19:40:21, > > 1015826235 300003861266 2012-07-09 18:19:59, > > 1015826235 140002997245 2012-07-09 09:23:17, > > 1015826235 210002448035 2012-07-09 22:21:11, > > 1015826235 260003553381 2012-07-09 07:09:56, > > 1015826235 260003553382 2012-07-09 19:40:39, > > 1015826235 260003553383 2012-07-09 06:58:47, > > 1015826235 260003553384 2012-07-09 07:28:47, > > 1015826235 260003553385 2012-07-09 08:48:47, > > 1015826235 260003553386 2012-07-09 06:38:47, > > 1015826235 260003553387 2012-07-09 05:38:47, > > 1015826235 260003553388 2012-07-09 04:55:47, > > 1015826235 260003553389 2012-07-09 06:54:37, > > 34512201 597245693 2012-07-09 16:20:21, > > 34512201 8071787728 2012-07-09 15:19:59, > > 34512201 5868222883 2012-07-09 08:23:17, > > 34512201 2412180494 2012-07-09 22:21:11, > > 34512201 2422054205 2012-07-09 06:09:56, > > 34512201 1875744030 2012-07-09 19:40:39, > > 34512201 5639158173 2012-07-09 06:58:47, > > 34512201 5656232360 2012-07-09 07:28:47, > > 34512201 959188449 2012-07-09 08:48:47, > > 34512201 4645350592 2012-07-09 06:38:47, > > 34512201 5657320532 2012-07-09 05:38:47, > > 34512201 290419656539 2012-07-09 04:55:47, > > > So basically if I do it like this- > > SELECT * FROM TestingTable1 ORDER BY buyer_id, created_time DESC; > > everything will get sorted in descending order for each BUYER_ID and then > I need to pick up TOP 10 rows for each BUYER_ID. > > And with your query that you just said, it's not working. I am getting > full data back like this. Below is the result I got from your query- > > 34512201 597245693 0 2012-07-09 16:20:21 > 34512201 959188449 0 2012-07-09 08:48:47 > 34512201 1875744030 0 2012-07-09 19:40:39
+
Raihan Jamal 2012-07-10, 23:21
-
Re: Invalid Function rank in HiveQL
Vijay 2012-07-10, 23:37
In that case, wouldn't this work:
SELECT buyer_id, item_id, rank(buyer_id), created_time FROM ( SELECT buyer_id, item_id, created_time FROM testingtable1 DISTRIBUTE BY buyer_id SORT BY buyer_id, created_time desc ) a WHERE rank(buyer_id) < 10; On Tue, Jul 10, 2012 at 4:21 PM, Raihan Jamal <[EMAIL PROTECTED]> wrote: > So the OUTPUT should be like this everything sorted in descending order with > time and only top 10 for each BUYER_ID- > > Sample Output. > > BUYER_ID | ITEM_ID | CREATED_TIME > > > ------------+------------------+----------------------- > > > > 34512201 2412180494 2012-07-09 22:21:11 > > > 34512201 1875744030 2012-07-09 19:40:39 > > > 34512201 597245693 2012-07-09 16:20:21 > > > 34512201 8071787728 2012-07-09 15:19:59 > > > 34512201 959188449 2012-07-09 08:48:47 > > > 34512201 5868222883 2012-07-09 08:23:17 > > > 34512201 5656232360 2012-07-09 07:28:47 > > > 34512201 5639158173 2012-07-09 06:58:47 > > > 34512201 4645350592 2012-07-09 06:38:47 > > > 34512201 2422054205 2012-07-09 06:09:56 > > > 1015826235 210002448035 2012-07-09 22:21:11 > > > 1015826235 260003553382 2012-07-09 19:40:39 > > > 1015826235 220003038067 2012-07-09 19:40:21 > > > 1015826235 300003861266 2012-07-09 18:19:59 > > > 1015826235 140002997245 2012-07-09 09:23:17 > > > 1015826235 260003553385 2012-07-09 08:48:47 > > > 1015826235 260003553384 2012-07-09 07:28:47 > > > 1015826235 260003553381 2012-07-09 07:09:56 > > > 1015826235 260003553383 2012-07-09 06:58:47 > > > 1015826235 260003553389 2012-07-09 06:54:37 > > > > > > Raihan Jamal > > > > On Tue, Jul 10, 2012 at 4:18 PM, Raihan Jamal <[EMAIL PROTECTED]> wrote: >> >> Thanks Vijay for reply. But it doesn't works out the way I needed. I am >> getting full data back for each BUYER_ID. Let me explain you more. >> >> This is the below data in the table and you can see in the below data this >> BUYER_ID - 1015826235 appears 13 times and this BUYER_ID -34512201 appears >> 12 times, And I need TOP 10 for each of the BUYER_ID BASIS on time- >> >> BUYER_ID | ITEM_ID | CREATED_TIME >> >> >> >> ------------+------------------+----------------------- >> >> >> >> 1015826235 220003038067 2012-07-09 19:40:21, >> >> >> >> 1015826235 300003861266 2012-07-09 18:19:59, >> >> >> >> 1015826235 140002997245 2012-07-09 09:23:17, >> >> >> >> 1015826235 210002448035 2012-07-09 22:21:11, >> >> >> >> 1015826235 260003553381 2012-07-09 07:09:56, >> >> >> >> 1015826235 260003553382 2012-07-09 19:40:39, >> >> >> >> 1015826235 260003553383 2012-07-09 06:58:47, >> >> >> >> 1015826235 260003553384 2012-07-09 07:28:47, >> >> >> >> 1015826235 260003553385 2012-07-09 08:48:47, >> >> >> >> 1015826235 260003553386 2012-07-09 06:38:47, >> >> >> >> 1015826235 260003553387 2012-07-09 05:38:47, >> >> >> >> 1015826235 260003553388 2012-07-09 04:55:47, >> >> >> >> 1015826235 260003553389 2012-07-09 06:54:37, >> >> >> >> 34512201 597245693 2012-07-09 16:20:21, >> >> >> >> 34512201 8071787728 2012-07-09 15:19:59, >> >> >> >> 34512201 5868222883 2012-07-09 08:23:17, >> >> >> >> 34512201 2412180494 2012-07-09 22:21:11, >> >> >> >> 34512201 2422054205 2012-07-09 06:09:56, >> >> >> >> 34512201 1875744030 2012-07-09 19:40:39, >> >> >> >> 34512201 5639158173 2012-07-09 06:58:47, >> >> >> >> 34512201 5656232360 2012-07-09 07:28:47, >> >> >> >> 34512201 959188449 2012-07-09 08:48:47, >> >> >> >> 34512201 4645350592 2012-07-09 06:38:47,
+
Vijay 2012-07-10, 23:37
-
Re: Invalid Function rank in HiveQL
Raihan Jamal 2012-07-11, 03:31
Yup this works. Thanks for the help.
*Raihan Jamal*
On Tue, Jul 10, 2012 at 4:37 PM, Vijay <[EMAIL PROTECTED]> wrote:
> In that case, wouldn't this work: > > SELECT buyer_id, item_id, rank(buyer_id), created_time > FROM ( > SELECT buyer_id, item_id, created_time > FROM testingtable1 > DISTRIBUTE BY buyer_id > SORT BY buyer_id, created_time desc > ) a > WHERE rank(buyer_id) < 10; > > > On Tue, Jul 10, 2012 at 4:21 PM, Raihan Jamal <[EMAIL PROTECTED]> > wrote: > > So the OUTPUT should be like this everything sorted in descending order > with > > time and only top 10 for each BUYER_ID- > > > > Sample Output. > > > > BUYER_ID | ITEM_ID | CREATED_TIME > > > > > > ------------+------------------+----------------------- > > > > > > > > 34512201 2412180494 2012-07-09 22:21:11 > > > > > > 34512201 1875744030 2012-07-09 19:40:39 > > > > > > 34512201 597245693 2012-07-09 16:20:21 > > > > > > 34512201 8071787728 2012-07-09 15:19:59 > > > > > > 34512201 959188449 2012-07-09 08:48:47 > > > > > > 34512201 5868222883 2012-07-09 08:23:17 > > > > > > 34512201 5656232360 2012-07-09 07:28:47 > > > > > > 34512201 5639158173 2012-07-09 06:58:47 > > > > > > 34512201 4645350592 2012-07-09 06:38:47 > > > > > > 34512201 2422054205 2012-07-09 06:09:56 > > > > > > 1015826235 210002448035 2012-07-09 22:21:11 > > > > > > 1015826235 260003553382 2012-07-09 19:40:39 > > > > > > 1015826235 220003038067 2012-07-09 19:40:21 > > > > > > 1015826235 300003861266 2012-07-09 18:19:59 > > > > > > 1015826235 140002997245 2012-07-09 09:23:17 > > > > > > 1015826235 260003553385 2012-07-09 08:48:47 > > > > > > 1015826235 260003553384 2012-07-09 07:28:47 > > > > > > 1015826235 260003553381 2012-07-09 07:09:56 > > > > > > 1015826235 260003553383 2012-07-09 06:58:47 > > > > > > 1015826235 260003553389 2012-07-09 06:54:37 > > > > > > > > > > > > Raihan Jamal > > > > > > > > On Tue, Jul 10, 2012 at 4:18 PM, Raihan Jamal <[EMAIL PROTECTED]> > wrote: > >> > >> Thanks Vijay for reply. But it doesn't works out the way I needed. I am > >> getting full data back for each BUYER_ID. Let me explain you more. > >> > >> This is the below data in the table and you can see in the below data > this > >> BUYER_ID - 1015826235 appears 13 times and this BUYER_ID -34512201 > appears > >> 12 times, And I need TOP 10 for each of the BUYER_ID BASIS on time- > >> > >> BUYER_ID | ITEM_ID | CREATED_TIME > >> > >> > >> > >> ------------+------------------+----------------------- > >> > >> > >> > >> 1015826235 220003038067 2012-07-09 19:40:21, > >> > >> > >> > >> 1015826235 300003861266 2012-07-09 18:19:59, > >> > >> > >> > >> 1015826235 140002997245 2012-07-09 09:23:17, > >> > >> > >> > >> 1015826235 210002448035 2012-07-09 22:21:11, > >> > >> > >> > >> 1015826235 260003553381 2012-07-09 07:09:56, > >> > >> > >> > >> 1015826235 260003553382 2012-07-09 19:40:39, > >> > >> > >> > >> 1015826235 260003553383 2012-07-09 06:58:47, > >> > >> > >> > >> 1015826235 260003553384 2012-07-09 07:28:47, > >> > >> > >> > >> 1015826235 260003553385 2012-07-09 08:48:47, > >> > >> > >> > >> 1015826235 260003553386 2012-07-09 06:38:47, > >> > >> > >> > >> 1015826235 260003553387 2012-07-09 05:38:47, > >> > >> > >> > >> 1015826235 260003553388 2012-07-09 04:55:47, > >> > >> > >> > >> 1015826235 260003553389 2012-07-09 06:54:37, > >> > >> > >> > >> 34512201 597245693 2012-07-09 16:20:21, > >> > >> > >> > >> 34512201 8071787728 2012-07-09 15:19:59, > >> > >> > >> > >> 34512201 5868222883 2012-07-09 08:23:17,
+
Raihan Jamal 2012-07-11, 03:31
|
|