|
|
-
Built - In Aggregate Function - Standard Deviation
Matt Pestritto 2009-05-26, 20:02
Hi.
Are there plans to write a standard deviation aggregate function ? I had to build my own which translated into multiple hive queries. While it works, a build-in function would have been much easier.
Thanks -Matt
+
Matt Pestritto 2009-05-26, 20:02
-
Re: Built - In Aggregate Function - Standard Deviation
Amr Awadallah 2009-05-27, 08:24
I agree that a builtin for std dev is a good idea.
that said, you can achieve this easy in one pass, just use:
select sum( pow(col,2) ) as totsqr, sum( col ) as tot, count(1) as n, pow( (n*totsqr - pow(tot,2) )/(n*(n-1)), 0.5) as stddev from ....
Matt Pestritto wrote: > Hi. > > Are there plans to write a standard deviation aggregate function ? I > had to build my own which translated into multiple hive queries. > While it works, a build-in function would have been much easier. > > Thanks > -Matt
+
Amr Awadallah 2009-05-27, 08:24
-
Re: Built - In Aggregate Function - Standard Deviation
Matt Pestritto 2009-05-30, 23:08
Ah - Thanks so much. I didn't know you could reference a column that has an aggregate function call again in the same select statement. This is much cleaner that the approach that I took. I'll give it a shot.
Thanks again.
On Wed, May 27, 2009 at 4:24 AM, Amr Awadallah <[EMAIL PROTECTED]> wrote:
> I agree that a builtin for std dev is a good idea. > > that said, you can achieve this easy in one pass, just use: > > select sum( pow(col,2) ) as totsqr, sum( col ) as tot, count(1) as n, pow( > (n*totsqr - pow(tot,2) )/(n*(n-1)), 0.5) as stddev > from .... > > > Matt Pestritto wrote: > >> Hi. >> >> Are there plans to write a standard deviation aggregate function ? I had >> to build my own which translated into multiple hive queries. While it >> works, a build-in function would have been much easier. >> >> Thanks >> -Matt >> >
+
Matt Pestritto 2009-05-30, 23:08
-
Re: Built - In Aggregate Function - Standard Deviation
Zheng Shao 2009-05-30, 23:22
Hive does not support that right now, but Hive supports subquery so we can do:
SELECT SQRT( (n*totalsqr - tot * tot) / (n*(n-1)) ) FROM (SELECT COUNT(1) as n, SUM(col) as total, SUM( col*col ) AS totalsqr FROM t) t2;
Zheng
On Sat, May 30, 2009 at 4:08 PM, Matt Pestritto <[EMAIL PROTECTED]> wrote:
> Ah - Thanks so much. I didn't know you could reference a column that has an > aggregate function call again in the same select statement. This is much > cleaner that the approach that I took. I'll give it a shot. > > Thanks again. > > > On Wed, May 27, 2009 at 4:24 AM, Amr Awadallah <[EMAIL PROTECTED]> wrote: > >> I agree that a builtin for std dev is a good idea. >> >> that said, you can achieve this easy in one pass, just use: >> >> select sum( pow(col,2) ) as totsqr, sum( col ) as tot, count(1) as n, pow( >> (n*totsqr - pow(tot,2) )/(n*(n-1)), 0.5) as stddev >> from .... >> >> >> Matt Pestritto wrote: >> >>> Hi. >>> >>> Are there plans to write a standard deviation aggregate function ? I had >>> to build my own which translated into multiple hive queries. While it >>> works, a build-in function would have been much easier. >>> >>> Thanks >>> -Matt >>> >> > -- Yours, Zheng
+
Zheng Shao 2009-05-30, 23:22
-
Re: Built - In Aggregate Function - Standard Deviation
Amr Awadallah 2009-05-31, 07:04
Zheng, is SQRT an undocumented builtin UDF? I couldn't see it in the lang manual at: http://wiki.apache.org/hadoop/Hive/LanguageManual/UDFhence why I did pow(x,0.5) instead -- amr > Hive does not support that right now, but Hive supports subquery so we > can do: > > SELECT SQRT( (n*totalsqr - tot * tot) / (n*(n-1)) ) > FROM (SELECT COUNT(1) as n, SUM(col) as total, SUM( col*col ) AS > totalsqr FROM t) t2; > > Zheng > > On Sat, May 30, 2009 at 4:08 PM, Matt Pestritto <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > Ah - Thanks so much. I didn't know you could reference a column > that has an aggregate function call again in the same select > statement. This is much cleaner that the approach that I took. > I'll give it a shot. > > Thanks again. > > > On Wed, May 27, 2009 at 4:24 AM, Amr Awadallah <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > I agree that a builtin for std dev is a good idea. > > that said, you can achieve this easy in one pass, just use: > > select sum( pow(col,2) ) as totsqr, sum( col ) as tot, > count(1) as n, pow( (n*totsqr - pow(tot,2) )/(n*(n-1)), 0.5) > as stddev > from .... > > > Matt Pestritto wrote: > > Hi. > > Are there plans to write a standard deviation aggregate > function ? I had to build my own which translated into > multiple hive queries. While it works, a build-in > function would have been much easier. > > Thanks > -Matt > > > > > > -- > Yours, > Zheng
+
Amr Awadallah 2009-05-31, 07:04
-
Re: Built - In Aggregate Function - Standard Deviation
Zheng Shao 2009-05-31, 09:35
Good catch. I just added that to wiki. Thanks Amr. Zheng On Sun, May 31, 2009 at 12:04 AM, Amr Awadallah <[EMAIL PROTECTED]> wrote: > Zheng, > > is SQRT an undocumented builtin UDF? I couldn't see it in the lang manual > at: > > http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF> > hence why I did pow(x,0.5) instead > > -- amr > > Hive does not support that right now, but Hive supports subquery so we can > do: > > SELECT SQRT( (n*totalsqr - tot * tot) / (n*(n-1)) ) > FROM (SELECT COUNT(1) as n, SUM(col) as total, SUM( col*col ) AS totalsqr > FROM t) t2; > > Zheng > > On Sat, May 30, 2009 at 4:08 PM, Matt Pestritto <[EMAIL PROTECTED]>wrote: > >> Ah - Thanks so much. I didn't know you could reference a column that has >> an aggregate function call again in the same select statement. This is much >> cleaner that the approach that I took. I'll give it a shot. >> >> Thanks again. >> >> On Wed, May 27, 2009 at 4:24 AM, Amr Awadallah <[EMAIL PROTECTED]> wrote: >> >>> I agree that a builtin for std dev is a good idea. >>> >>> that said, you can achieve this easy in one pass, just use: >>> >>> select sum( pow(col,2) ) as totsqr, sum( col ) as tot, count(1) as n, >>> pow( (n*totsqr - pow(tot,2) )/(n*(n-1)), 0.5) as stddev >>> from .... >>> >>> Matt Pestritto wrote: >>> >>>> Hi. >>>> >>>> Are there plans to write a standard deviation aggregate function ? I >>>> had to build my own which translated into multiple hive queries. While it >>>> works, a build-in function would have been much easier. >>>> >>>> Thanks >>>> -Matt >>>> >>> >> > > > -- > Yours, > Zheng > > > -- Yours, Zheng
+
Zheng Shao 2009-05-31, 09:35
|
|