|
|
Hi All, Does hive support subquery in select statement?
Given below data, I need counts and percentage counts per category. The divisor in my case is not the total count, but something that is stored in another table.. In T-SQL, I can do subquery in select statement to get my divisor.
Select id, count(*), count(*)/ (select val from tableY where name like ‘xyzdivisor’) >From data Group by category
What am I missing? This looks like a simple case.. Id Category Count Output Percentage Output 1 A A 4 A 40 (4/10)*100 2 A B 3 B 30 3 B C 3 C 30 4 A 5 C 6 C 7 B 8 B 9 A 10 C Thanks, Prashant
Edward Capriolo 2012-05-25, 15:26
No. But hive does support nested selects so must queries can be re-written to accomplish the same thing. Actually I love hives "FROM ( ) SELECT" syntax It takes a while to get used to but it is much more clear then SQL standard switch allows supqueries in all kinds of places and has about 40 difference ways to express the same query. https://cwiki.apache.org/Hive/languagemanual.htmlOn Fri, May 25, 2012 at 11:14 AM, shan s <[EMAIL PROTECTED]> wrote: > Hi All, > Does hive support subquery in select statement? > > Given below data, I need counts and percentage counts per category. The > divisor in my case is not the total count, but something that is stored in > another table.. In T-SQL, I can do subquery in select statement to get my > divisor. > > Select id, count(*), count(*)/ (select val from tableY where name like > ‘xyzdivisor’) > From data > Group by category > > What am I missing? This looks like a simple case.. > > Id Category Count Output Percentage Output > 1 A A 4 A 40 (4/10)*100 > 2 A B 3 B 30 > 3 B C 3 C 30 > 4 A > 5 C > 6 C > 7 B > 8 B > 9 A > 10 C > Thanks, Prashant
Thanks Edward. But I didn't get the trick yet. I was able to use FROM with multiple group bys. But failed to see what to replace the subquery with... Could you please give an example for my use case below. Select id, count(*), count(*)/ (select val from tableY where name like ‘xyzdivisor’) >From data Group by category Thank You, Prashant. On Fri, May 25, 2012 at 8:56 PM, Edward Capriolo <[EMAIL PROTECTED]>wrote: > No. But hive does support nested selects so must queries can be > re-written to accomplish the same thing. Actually I love hives > > "FROM ( ) SELECT" syntax > > It takes a while to get used to but it is much more clear then SQL > standard switch allows supqueries in all kinds of places and has about > 40 difference ways to express the same query. > > https://cwiki.apache.org/Hive/languagemanual.html> > On Fri, May 25, 2012 at 11:14 AM, shan s <[EMAIL PROTECTED]> wrote: > > Hi All, > > Does hive support subquery in select statement? > > > > Given below data, I need counts and percentage counts per category. The > > divisor in my case is not the total count, but something that is stored > in > > another table.. In T-SQL, I can do subquery in select statement to get my > > divisor. > > > > Select id, count(*), count(*)/ (select val from tableY where name like > > ‘xyzdivisor’) > > From data > > Group by category > > > > What am I missing? This looks like a simple case.. > > > > Id Category Count Output Percentage Output > > 1 A A 4 A 40 (4/10)*100 > > 2 A B 3 B 30 > > 3 B C 3 C 30 > > 4 A > > 5 C > > 6 C > > 7 B > > 8 B > > 9 A > > 10 C > > Thanks, Prashant >
select a.id, a.count, a.count/b.val from (select id, count(*) as count from data group by id) a, (select val from tableY ....) b You don't have a join condition, this maybe output very large data. On Mon, May 28, 2012 at 11:45 AM, shan s <[EMAIL PROTECTED]> wrote: > Thanks Edward. But I didn't get the trick yet. > I was able to use FROM with multiple group bys. But failed to see what to > replace the subquery with... > > Could you please give an example for my use case below. > > Select id, count(*), count(*)/ (select val from tableY where name like > ‘xyzdivisor’) > From data > Group by category > > Thank You, > Prashant. > On Fri, May 25, 2012 at 8:56 PM, Edward Capriolo <[EMAIL PROTECTED]> > wrote: >> >> No. But hive does support nested selects so must queries can be >> re-written to accomplish the same thing. Actually I love hives >> >> "FROM ( ) SELECT" syntax >> >> It takes a while to get used to but it is much more clear then SQL >> standard switch allows supqueries in all kinds of places and has about >> 40 difference ways to express the same query. >> >> https://cwiki.apache.org/Hive/languagemanual.html>> >> On Fri, May 25, 2012 at 11:14 AM, shan s <[EMAIL PROTECTED]> wrote: >> > Hi All, >> > Does hive support subquery in select statement? >> > >> > Given below data, I need counts and percentage counts per category. The >> > divisor in my case is not the total count, but something that is stored >> > in >> > another table.. In T-SQL, I can do subquery in select statement to get >> > my >> > divisor. >> > >> > Select id, count(*), count(*)/ (select val from tableY where name like >> > ‘xyzdivisor’) >> > From data >> > Group by category >> > >> > What am I missing? This looks like a simple case.. >> > >> > Id Category Count Output Percentage Output >> > 1 A A 4 A 40 (4/10)*100 >> > 2 A B 3 B 30 >> > 3 B C 3 C 30 >> > 4 A >> > 5 C >> > 6 C >> > 7 B >> > 8 B >> > 9 A >> > 10 C >> > Thanks, Prashant > >
I transled it to below, but I am getting an error. FAILED: Parse Error: line 6:50 mismatched input ',' expecting EOF near 'a' select a.category, a.count, a.count/b.totalCount as percentageValue from (select category, count(*) as count from gt group by cat) a, (select count(*) as totalCount from gt) b ; Thanks for your help! On Mon, May 28, 2012 at 9:37 AM, wd <[EMAIL PROTECTED]> wrote: > select a.id, a.count, a.count/b.val > from (select id, count(*) as count from data group by id) a, > (select val from tableY ....) b > > You don't have a join condition, this maybe output very large data. > > On Mon, May 28, 2012 at 11:45 AM, shan s <[EMAIL PROTECTED]> wrote: > > Thanks Edward. But I didn't get the trick yet. > > I was able to use FROM with multiple group bys. But failed to see what to > > replace the subquery with... > > > > Could you please give an example for my use case below. > > > > Select id, count(*), count(*)/ (select val from tableY where name like > > ‘xyzdivisor’) > > From data > > Group by category > > > > Thank You, > > Prashant. > > On Fri, May 25, 2012 at 8:56 PM, Edward Capriolo <[EMAIL PROTECTED]> > > wrote: > >> > >> No. But hive does support nested selects so must queries can be > >> re-written to accomplish the same thing. Actually I love hives > >> > >> "FROM ( ) SELECT" syntax > >> > >> It takes a while to get used to but it is much more clear then SQL > >> standard switch allows supqueries in all kinds of places and has about > >> 40 difference ways to express the same query. > >> > >> https://cwiki.apache.org/Hive/languagemanual.html> >> > >> On Fri, May 25, 2012 at 11:14 AM, shan s <[EMAIL PROTECTED]> wrote: > >> > Hi All, > >> > Does hive support subquery in select statement? > >> > > >> > Given below data, I need counts and percentage counts per category. > The > >> > divisor in my case is not the total count, but something that is > stored > >> > in > >> > another table.. In T-SQL, I can do subquery in select statement to get > >> > my > >> > divisor. > >> > > >> > Select id, count(*), count(*)/ (select val from tableY where name like > >> > ‘xyzdivisor’) > >> > From data > >> > Group by category > >> > > >> > What am I missing? This looks like a simple case.. > >> > > >> > Id Category Count Output Percentage Output > >> > 1 A A 4 A 40 (4/10)*100 > >> > 2 A B 3 B 30 > >> > 3 B C 3 C 30 > >> > 4 A > >> > 5 C > >> > 6 C > >> > 7 B > >> > 8 B > >> > 9 A > >> > 10 C > >> > Thanks, Prashant > > > > >
group by category
On Mon, May 28, 2012 at 2:20 PM, shan s <[EMAIL PROTECTED]> wrote: > (select category, count(*) as count from gt group by cat) a,
That was a typo in the email, but it still errors after the typo is corrected. Did you try to run it?
Here is the entire script after creating the file, gt.txt from data below....
CREATE EXTERNAL TABLE IF NOT EXISTS gt (id INT, category STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE LOCATION '/user/user1/ht/gt'; LOAD DATA LOCAL INPATH 'gt.txt' OVERWRITE INTO TABLE gt; CREATE TABLE IF NOT EXISTS Res (category STRING, count INT, perVal INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; INSERT OVERWRITE TABLE Res select a.category, a.count, b.totalCount from (select category, count(*) as count from gt group by category) a, (select count(*) as totalCount from gt) b ;
On Mon, May 28, 2012 at 1:55 PM, wd <[EMAIL PROTECTED]> wrote:
> group by category > > On Mon, May 28, 2012 at 2:20 PM, shan s <[EMAIL PROTECTED]> wrote: > > (select category, count(*) as count from gt group by cat) a, >
I didn't have run it. I think the sql I write should work in every db....
On Mon, May 28, 2012 at 4:54 PM, shan s <[EMAIL PROTECTED]> wrote: > That was a typo in the email, but it still errors after the typo is > corrected. > Did you try to run it? > > Here is the entire script after creating the file, gt.txt from data > below.... > > CREATE EXTERNAL TABLE IF NOT EXISTS gt (id INT, category STRING) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE > LOCATION '/user/user1/ht/gt'; > LOAD DATA LOCAL INPATH 'gt.txt' OVERWRITE INTO TABLE gt; > CREATE TABLE IF NOT EXISTS Res (category STRING, count INT, perVal INT) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; > INSERT OVERWRITE TABLE Res > select a.category, a.count, b.totalCount > from > (select category, count(*) as count from gt group by category) a, >
Maybe you should delete this white line?
> (select count(*) as totalCount from gt) b ; > > On Mon, May 28, 2012 at 1:55 PM, wd <[EMAIL PROTECTED]> wrote: >> >> group by category >> >> On Mon, May 28, 2012 at 2:20 PM, shan s <[EMAIL PROTECTED]> wrote: >> > (select category, count(*) as count from gt group by cat) a, > >
|
|