|
Mohit Anchlia
2012-05-24, 20:02
Ashish Thusoo
2012-05-24, 20:53
Mohit Anchlia
2012-05-24, 21:13
Roberto Sanabria
2012-05-24, 21:17
Edward Capriolo
2012-05-24, 21:19
Mohit Anchlia
2012-05-24, 21:57
Roberto Sanabria
2012-05-24, 22:06
Gesli, Nicole
2012-05-24, 22:10
Mohit Anchlia
2012-05-24, 22:22
|
-
SQL helpMohit Anchlia 2012-05-24, 20:02
I am new to Hive. I have several SQL from RDBMS database that I need to
convert to hive. What's the best reference for HIVEQL? For now I am trying to figure out how to do this in hive: Select distinct A_ID, First_Value(path IGNORE NULLS) over(PARTITION BY A_ID ORDER BY t_timestamp) From WEB_DATA Any help would be appreciated.
-
Re: SQL helpAshish Thusoo 2012-05-24, 20:53
Hi Mohit,
Hive does not support window functions afaik. The following link might be useful if you can bring that in... https://github.com/hbutani/SQLWindowing/wiki Not sure if this is being brought into trunk at some point... Ashish On Thu, May 24, 2012 at 1:02 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > I am new to Hive. I have several SQL from RDBMS database that I need to > convert to hive. What's the best reference for HIVEQL? For now I am trying > to figure out how to do this in hive: > > Select distinct A_ID, First_Value(path IGNORE NULLS) over(PARTITION BY > A_ID ORDER BY t_timestamp) From WEB_DATA > > Any help would be appreciated. >
-
Re: SQL helpMohit Anchlia 2012-05-24, 21:13
I am now trying to do it this way but doesn't work in hive. I think I am
missing something here, can someone please help? select a_id from web_data t1 where a_id = (select min(a_id) from web_data t2 where t2.t_timestamp = t1.t_timestamp) I get: FAILED: Parse Error: line 1:69 cannot recognize input near 'select' 'min' '(' in expression specification On Thu, May 24, 2012 at 1:02 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > I am new to Hive. I have several SQL from RDBMS database that I need to > convert to hive. What's the best reference for HIVEQL? For now I am trying > to figure out how to do this in hive: > > Select distinct A_ID, First_Value(path IGNORE NULLS) over(PARTITION BY > A_ID ORDER BY t_timestamp) From WEB_DATA > > Any help would be appreciated. >
-
Re: SQL helpRoberto Sanabria 2012-05-24, 21:17
You can try using where a_id in (subquery). But I don't think hive supports
subqueries in where clauses. You would have to turn this into a join statement. On Thu, May 24, 2012 at 2:13 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > I am now trying to do it this way but doesn't work in hive. I think I am > missing something here, can someone please help? > > select a_id from web_data t1 where a_id = (select min(a_id) from web_data > t2 where t2.t_timestamp = t1.t_timestamp) > > I get: > > > FAILED: Parse Error: line 1:69 cannot recognize input near 'select' 'min' > '(' in expression specification > > > On Thu, May 24, 2012 at 1:02 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > >> I am new to Hive. I have several SQL from RDBMS database that I need to >> convert to hive. What's the best reference for HIVEQL? For now I am trying >> to figure out how to do this in hive: >> >> Select distinct A_ID, First_Value(path IGNORE NULLS) over(PARTITION BY >> A_ID ORDER BY t_timestamp) From WEB_DATA >> >> Any help would be appreciated. >> > > >
-
Re: SQL helpEdward Capriolo 2012-05-24, 21:19
Hive is not SQL 92 compliant or whatever.
https://cwiki.apache.org/Hive/languagemanual.html in particular you can not do subselects inside the in or the where clause. Hive usually have other formulations like left semi join that makes things 'like in' and 'not in' possible. Edward On Thu, May 24, 2012 at 5:13 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote: > I am now trying to do it this way but doesn't work in hive. I think I am > missing something here, can someone please help? > > select a_id from web_data t1 where a_id = (select min(a_id) from web_data t2 > where t2.t_timestamp = t1.t_timestamp) > > I get: > > > FAILED: Parse Error: line 1:69 cannot recognize input near 'select' 'min' > '(' in expression specification > > > > On Thu, May 24, 2012 at 1:02 PM, Mohit Anchlia <[EMAIL PROTECTED]> > wrote: >> >> I am new to Hive. I have several SQL from RDBMS database that I need to >> convert to hive. What's the best reference for HIVEQL? For now I am trying >> to figure out how to do this in hive: >> >> Select distinct A_ID, First_Value(path IGNORE NULLS) over(PARTITION BY >> A_ID ORDER BY t_timestamp) From WEB_DATA >> >> Any help would be appreciated. > > >
-
Re: SQL helpMohit Anchlia 2012-05-24, 21:57
On Thu, May 24, 2012 at 2:19 PM, Edward Capriolo <[EMAIL PROTECTED]>wrote:
> Hive is not SQL 92 compliant or whatever. > > https://cwiki.apache.org/Hive/languagemanual.html > > in particular you can not do subselects inside the in or the where > clause. Hive usually have other formulations like left semi join that > makes things 'like in' and 'not in' possible. > > Thanks. But what I am looking for is to select only those rows that are of min(t_timestamp) for a given a_id. What would be the best way? I guess do some kind of group by and store it in intermediate file and run another select on it? > Edward > On Thu, May 24, 2012 at 5:13 PM, Mohit Anchlia <[EMAIL PROTECTED]> > wrote: > > I am now trying to do it this way but doesn't work in hive. I think I am > > missing something here, can someone please help? > > > > select a_id from web_data t1 where a_id = (select min(a_id) from > web_data t2 > > where t2.t_timestamp = t1.t_timestamp) > > > > I get: > > > > > > FAILED: Parse Error: line 1:69 cannot recognize input near 'select' 'min' > > '(' in expression specification > > > > > > > > On Thu, May 24, 2012 at 1:02 PM, Mohit Anchlia <[EMAIL PROTECTED]> > > wrote: > >> > >> I am new to Hive. I have several SQL from RDBMS database that I need to > >> convert to hive. What's the best reference for HIVEQL? For now I am > trying > >> to figure out how to do this in hive: > >> > >> Select distinct A_ID, First_Value(path IGNORE NULLS) over(PARTITION BY > >> A_ID ORDER BY t_timestamp) From WEB_DATA > >> > >> Any help would be appreciated. > > > > > > >
-
Re: SQL helpRoberto Sanabria 2012-05-24, 22:06
"I guess do some kind of group by and store it in intermediate file and run
another select on it?" Yes, that is my recommendation. On Thu, May 24, 2012 at 2:57 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > > > On Thu, May 24, 2012 at 2:19 PM, Edward Capriolo <[EMAIL PROTECTED]>wrote: > >> Hive is not SQL 92 compliant or whatever. >> >> https://cwiki.apache.org/Hive/languagemanual.html >> >> in particular you can not do subselects inside the in or the where >> clause. Hive usually have other formulations like left semi join that >> makes things 'like in' and 'not in' possible. >> >> Thanks. But what I am looking for is to select only those rows that are > of min(t_timestamp) for a given a_id. What would be the best way? I guess > do some kind of group by and store it in intermediate file and run another > select on it? > > >> Edward >> On Thu, May 24, 2012 at 5:13 PM, Mohit Anchlia <[EMAIL PROTECTED]> >> wrote: >> > I am now trying to do it this way but doesn't work in hive. I think I am >> > missing something here, can someone please help? >> > >> > select a_id from web_data t1 where a_id = (select min(a_id) from >> web_data t2 >> > where t2.t_timestamp = t1.t_timestamp) >> > >> > I get: >> > >> > >> > FAILED: Parse Error: line 1:69 cannot recognize input near 'select' >> 'min' >> > '(' in expression specification >> > >> > >> > >> > On Thu, May 24, 2012 at 1:02 PM, Mohit Anchlia <[EMAIL PROTECTED]> >> > wrote: >> >> >> >> I am new to Hive. I have several SQL from RDBMS database that I need to >> >> convert to hive. What's the best reference for HIVEQL? For now I am >> trying >> >> to figure out how to do this in hive: >> >> >> >> Select distinct A_ID, First_Value(path IGNORE NULLS) over(PARTITION BY >> >> A_ID ORDER BY t_timestamp) From WEB_DATA >> >> >> >> Any help would be appreciated. >> > >> > >> > >> > >
-
Re: SQL helpGesli, Nicole 2012-05-24, 22:10
Try this:
SELECT a.a_id, b.path FROM ( SELECT a_id, MIN(t_timestamp) t_timestamp FROM web_data GROUP BY a_id ) a JOIN web_data b ON ( b.a_id = a.a_id AND b.t_timestamp = a.t_timestamp ) -Nicole From: Roberto Sanabria <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Reply-To: <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Date: Thu, 24 May 2012 15:06:29 -0700 To: <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Subject: Re: SQL help "I guess do some kind of group by and store it in intermediate file and run another select on it?" Yes, that is my recommendation. On Thu, May 24, 2012 at 2:57 PM, Mohit Anchlia <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: On Thu, May 24, 2012 at 2:19 PM, Edward Capriolo <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Hive is not SQL 92 compliant or whatever. https://cwiki.apache.org/Hive/languagemanual.html in particular you can not do subselects inside the in or the where clause. Hive usually have other formulations like left semi join that makes things 'like in' and 'not in' possible. Thanks. But what I am looking for is to select only those rows that are of min(t_timestamp) for a given a_id. What would be the best way? I guess do some kind of group by and store it in intermediate file and run another select on it? Edward On Thu, May 24, 2012 at 5:13 PM, Mohit Anchlia <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: > I am now trying to do it this way but doesn't work in hive. I think I am > missing something here, can someone please help? > > select a_id from web_data t1 where a_id = (select min(a_id) from web_data t2 > where t2.t_timestamp = t1.t_timestamp) > > I get: > > > FAILED: Parse Error: line 1:69 cannot recognize input near 'select' 'min' > '(' in expression specification > > > > On Thu, May 24, 2012 at 1:02 PM, Mohit Anchlia <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> > wrote: >> >> I am new to Hive. I have several SQL from RDBMS database that I need to >> convert to hive. What's the best reference for HIVEQL? For now I am trying >> to figure out how to do this in hive: >> >> Select distinct A_ID, First_Value(path IGNORE NULLS) over(PARTITION BY >> A_ID ORDER BY t_timestamp) From WEB_DATA >> >> Any help would be appreciated. > > >
-
Re: SQL helpMohit Anchlia 2012-05-24, 22:22
On Thu, May 24, 2012 at 3:10 PM, Gesli, Nicole
<[EMAIL PROTECTED]>wrote: > Try this: > > SELECT a.a_id, b.path > FROM ( SELECT a_id, MIN(t_timestamp) t_timestamp > FROM web_data > GROUP BY a_id > ) a JOIN > web_data b ON ( b.a_id = a.a_id AND b.t_timestamp = a.t_timestamp ) > Awesome that works. I didn't realize I could do it this way. Thanks for your help and others as well. > > -Nicole > > From: Roberto Sanabria <[EMAIL PROTECTED]> > Reply-To: <[EMAIL PROTECTED]> > Date: Thu, 24 May 2012 15:06:29 -0700 > To: <[EMAIL PROTECTED]> > Subject: Re: SQL help > > "I guess do some kind of group by and store it in intermediate file and > run another select on it?" > > Yes, that is my recommendation. > > On Thu, May 24, 2012 at 2:57 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > >> >> >> On Thu, May 24, 2012 at 2:19 PM, Edward Capriolo <[EMAIL PROTECTED]>wrote: >> >>> Hive is not SQL 92 compliant or whatever. >>> >>> https://cwiki.apache.org/Hive/languagemanual.html >>> >>> in particular you can not do subselects inside the in or the where >>> clause. Hive usually have other formulations like left semi join that >>> makes things 'like in' and 'not in' possible. >>> >>> Thanks. But what I am looking for is to select only those rows that are >> of min(t_timestamp) for a given a_id. What would be the best way? I guess >> do some kind of group by and store it in intermediate file and run another >> select on it? >> >> >>> Edward >>> On Thu, May 24, 2012 at 5:13 PM, Mohit Anchlia <[EMAIL PROTECTED]> >>> wrote: >>> > I am now trying to do it this way but doesn't work in hive. I think I >>> am >>> > missing something here, can someone please help? >>> > >>> > select a_id from web_data t1 where a_id = (select min(a_id) from >>> web_data t2 >>> > where t2.t_timestamp = t1.t_timestamp) >>> > >>> > I get: >>> > >>> > >>> > FAILED: Parse Error: line 1:69 cannot recognize input near 'select' >>> 'min' >>> > '(' in expression specification >>> > >>> > >>> > >>> > On Thu, May 24, 2012 at 1:02 PM, Mohit Anchlia <[EMAIL PROTECTED] >>> > >>> > wrote: >>> >> >>> >> I am new to Hive. I have several SQL from RDBMS database that I need >>> to >>> >> convert to hive. What's the best reference for HIVEQL? For now I am >>> trying >>> >> to figure out how to do this in hive: >>> >> >>> >> Select distinct A_ID, First_Value(path IGNORE NULLS) over(PARTITION >>> BY >>> >> A_ID ORDER BY t_timestamp) From WEB_DATA >>> >> >>> >> Any help would be appreciated. >>> > >>> > >>> > >>> >> >> > |