


Query output formatting
Steven Wong 20101206, 21:01
I have this query to calculate some averages:
select hour, cdn, avg(bitrate) from fact_table group by hour, cdn 1 8 a 1 9 b 2 8 c 3 8 d 3 9 e
But I want the output to be in the following format so that it can be eyeballed/graphed more easily:
1 a b 2 c NULL 3 d e
(The 2nd and 3rd columns are averages for cdn 8 and 9, respectively.)
Is there an easy way to do this  in Hive, Unix, etc.? Suggestions (or, better yet, solutions) are welcome.
I imagine a UDAF could do this (see below), but AFAIK it is not built into Hive.
select hour, some_udaf(abr, cdn, array(8, 9)) from (select hour, cdn, avg(bitrate) as abr from fact_table group by hour, cdn) t group by hour
Thanks. Steven

Re: Query output formatting
yongqiang he 20101206, 21:17
You can use collect_set() udaf. (And use lateral view join and explode if you want operate on the set data.)
On Mon, Dec 6, 2010 at 1:01 PM, Steven Wong <[EMAIL PROTECTED]> wrote: > I have this query to calculate some averages: > > > > select hour, cdn, avg(bitrate) from fact_table group by hour, cdn > > 1 8 a > > 1 9 b > > 2 8 c > > 3 8 d > > 3 9 e > > > > But I want the output to be in the following format so that it can be > eyeballed/graphed more easily: > > > > 1 a b > > 2 c NULL > > 3 d e > > > > (The 2nd and 3rd columns are averages for cdn 8 and 9, respectively.) > > > > Is there an easy way to do this – in Hive, Unix, etc.? Suggestions (or, > better yet, solutions) are welcome. > > > > I imagine a UDAF could do this (see below), but AFAIK it is not built into > Hive. > > > > select hour, some_udaf(abr, cdn, array(8, 9)) > > from (select hour, cdn, avg(bitrate) as abr from fact_table group by hour, > cdn) t > > group by hour > > > > Thanks. > > Steven > >

RE: Query output formatting
Steven Wong 20101206, 22:23
collect_set doesn't output in any particular order, so it can't be used, right?
Original Message From: yongqiang he [mailto:[EMAIL PROTECTED]] Sent: Monday, December 06, 2010 1:18 PM To: [EMAIL PROTECTED] Subject: Re: Query output formatting
You can use collect_set() udaf. (And use lateral view join and explode if you want operate on the set data.)
On Mon, Dec 6, 2010 at 1:01 PM, Steven Wong <[EMAIL PROTECTED]> wrote: > I have this query to calculate some averages: > > > > select hour, cdn, avg(bitrate) from fact_table group by hour, cdn > > 1 8 a > > 1 9 b > > 2 8 c > > 3 8 d > > 3 9 e > > > > But I want the output to be in the following format so that it can be > eyeballed/graphed more easily: > > > > 1 a b > > 2 c NULL > > 3 d e > > > > (The 2nd and 3rd columns are averages for cdn 8 and 9, respectively.) > > > > Is there an easy way to do this  in Hive, Unix, etc.? Suggestions (or, > better yet, solutions) are welcome. > > > > I imagine a UDAF could do this (see below), but AFAIK it is not built into > Hive. > > > > select hour, some_udaf(abr, cdn, array(8, 9)) > > from (select hour, cdn, avg(bitrate) as abr from fact_table group by hour, > cdn) t > > group by hour > > > > Thanks. > > Steven > >

Re: Query output formatting
Dilip Joseph 20101207, 00:32
If you have a fixed number of known CDNs, the following query can help: SELECT hour, SUM(IF(cdn=8, bitrate,0))/SUM(IF(cdn=8, 1, 0)) avgBitrateCdn8, SUM(IF(cdn=9, bitrate,0))/SUM(IF(cdn=9, 1, 0)) avgBitrateCdn9  You will need more IFs to handle 0 denominators. FROM fact_table GROUP BY hour Dilip On Mon, Dec 6, 2010 at 1:01 PM, Steven Wong <[EMAIL PROTECTED]> wrote: > I have this query to calculate some averages: > > > > select hour, cdn, avg(bitrate) from fact_table group by hour, cdn > > 1 8 a > > 1 9 b > > 2 8 c > > 3 8 d > > 3 9 e > > > > But I want the output to be in the following format so that it can be > eyeballed/graphed more easily: > > > > 1 a b > > 2 c NULL > > 3 d e > > > > (The 2nd and 3rd columns are averages for cdn 8 and 9, respectively.) > > > > Is there an easy way to do this – in Hive, Unix, etc.? Suggestions (or, > better yet, solutions) are welcome. > > > > I imagine a UDAF could do this (see below), but AFAIK it is not built into > Hive. > > > > select hour, some_udaf(abr, cdn, array(8, 9)) > > from (select hour, cdn, avg(bitrate) as abr from fact_table group by hour, > cdn) t > > group by hour > > > > Thanks. > > Steven > > >  _________________________________________ Dilip Antony Joseph http://csgrad.blogspot.comhttp://www.marydilip.info

RE: Query output formatting
Steven Wong 20101207, 06:36
Good tip, thanks! From: Dilip Joseph [mailto:[EMAIL PROTECTED]] Sent: Monday, December 06, 2010 4:33 PM To: [EMAIL PROTECTED] Subject: Re: Query output formatting If you have a fixed number of known CDNs, the following query can help: SELECT hour, SUM(IF(cdn=8, bitrate,0))/SUM(IF(cdn=8, 1, 0)) avgBitrateCdn8, SUM(IF(cdn=9, bitrate,0))/SUM(IF(cdn=9, 1, 0)) avgBitrateCdn9  You will need more IFs to handle 0 denominators. FROM fact_table GROUP BY hour Dilip On Mon, Dec 6, 2010 at 1:01 PM, Steven Wong <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: I have this query to calculate some averages: select hour, cdn, avg(bitrate) from fact_table group by hour, cdn 1 8 a 1 9 b 2 8 c 3 8 d 3 9 e But I want the output to be in the following format so that it can be eyeballed/graphed more easily: 1 a b 2 c NULL 3 d e (The 2nd and 3rd columns are averages for cdn 8 and 9, respectively.) Is there an easy way to do this  in Hive, Unix, etc.? Suggestions (or, better yet, solutions) are welcome. I imagine a UDAF could do this (see below), but AFAIK it is not built into Hive. select hour, some_udaf(abr, cdn, array(8, 9)) from (select hour, cdn, avg(bitrate) as abr from fact_table group by hour, cdn) t group by hour Thanks. Steven  _________________________________________ Dilip Antony Joseph http://csgrad.blogspot.comhttp://www.marydilip.info

