|
|
-
Why no two aggregations can have different DISTINCT columns ?
Jeff Zhang 2010-02-25, 09:01
Hi all,
I read the tutorial of Hive, and it says that "no two aggregations can have different DISTINCT columns". Could anyone tell what is the reason ? Does the following Distinct will been translate to map-reduce job or just do it locally ?
INSERT OVERWRITE TABLE pv_gender_agg SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(DISTINCT pv_users.ip) FROM pv_users GROUP BY pv_users.gender; -- Best Regards
Jeff Zhang
+
Jeff Zhang 2010-02-25, 09:01
-
Re: Why no two aggregations can have different DISTINCT columns ?
Zheng Shao 2010-02-25, 09:07
This will get a compilation error. The reason is that we use the sort phase in reducers to make sure we can detect duplicate values. We can only sort the table in one way than the other. See https://issues.apache.org/jira/browse/HIVE-537 and https://issues.apache.org/jira/browse/HIVE-474 for details. Zheng On Thu, Feb 25, 2010 at 1:01 AM, Jeff Zhang <[EMAIL PROTECTED]> wrote: > > Hi all, > > I read the tutorial of Hive, and it says that "no two aggregations can have > different DISTINCT columns". Could anyone tell what is the reason ? Does the > following Distinct will been translate to map-reduce job or just do it > locally ? > > INSERT OVERWRITE TABLE pv_gender_agg > SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(DISTINCT > pv_users.ip) > FROM pv_users > GROUP BY pv_users.gender; > > -- > Best Regards > > Jeff Zhang > -- Yours, Zheng
+
Zheng Shao 2010-02-25, 09:07
-
Re: Why no two aggregations can have different DISTINCT columns ?
Mafish Liu 2010-02-25, 09:16
Hive does not support multi-distinct in one query.
We have implemented multi-distinct based on hive 0.4.2rc to our demand. We don't know that if Hive is intresting in this feature.
2010/2/25 Jeff Zhang <[EMAIL PROTECTED]>: > > Hi all, > > I read the tutorial of Hive, and it says that "no two aggregations can have > different DISTINCT columns". Could anyone tell what is the reason ? Does the > following Distinct will been translate to map-reduce job or just do it > locally ? > > INSERT OVERWRITE TABLE pv_gender_agg > SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(DISTINCT > pv_users.ip) > FROM pv_users > GROUP BY pv_users.gender; > > -- > Best Regards > > Jeff Zhang >
-- [EMAIL PROTECTED]
+
Mafish Liu 2010-02-25, 09:16
-
Re: Why no two aggregations can have different DISTINCT columns ?
Zheng Shao 2010-02-25, 09:20
Yes definitely. Do you want to open a JIRA and post a patch? Please link the new JIRA to the other 2 JIRA that was mentioned in the same email thread.
Zheng
On Thu, Feb 25, 2010 at 1:16 AM, Mafish Liu <[EMAIL PROTECTED]> wrote: > Hive does not support multi-distinct in one query. > > We have implemented multi-distinct based on hive 0.4.2rc to our demand. > We don't know that if Hive is intresting in this feature. > > 2010/2/25 Jeff Zhang <[EMAIL PROTECTED]>: >> >> Hi all, >> >> I read the tutorial of Hive, and it says that "no two aggregations can have >> different DISTINCT columns". Could anyone tell what is the reason ? Does the >> following Distinct will been translate to map-reduce job or just do it >> locally ? >> >> INSERT OVERWRITE TABLE pv_gender_agg >> SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(DISTINCT >> pv_users.ip) >> FROM pv_users >> GROUP BY pv_users.gender; >> >> -- >> Best Regards >> >> Jeff Zhang >> > > > > -- > [EMAIL PROTECTED] >
-- Yours, Zheng
+
Zheng Shao 2010-02-25, 09:20
-
Re: Why no two aggregations can have different DISTINCT columns ?
Amr Awadallah 2010-02-25, 09:25
+1, please post jira/patch.
-- amr
On 2/25/2010 1:20 AM, Zheng Shao wrote: > Yes definitely. Do you want to open a JIRA and post a patch? > Please link the new JIRA to the other 2 JIRA that was mentioned in the > same email thread. > > Zheng > > On Thu, Feb 25, 2010 at 1:16 AM, Mafish Liu<[EMAIL PROTECTED]> wrote: > >> Hive does not support multi-distinct in one query. >> >> We have implemented multi-distinct based on hive 0.4.2rc to our demand. >> We don't know that if Hive is intresting in this feature. >> >> 2010/2/25 Jeff Zhang<[EMAIL PROTECTED]>: >> >>> Hi all, >>> >>> I read the tutorial of Hive, and it says that "no two aggregations can have >>> different DISTINCT columns". Could anyone tell what is the reason ? Does the >>> following Distinct will been translate to map-reduce job or just do it >>> locally ? >>> >>> INSERT OVERWRITE TABLE pv_gender_agg >>> SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(DISTINCT >>> pv_users.ip) >>> FROM pv_users >>> GROUP BY pv_users.gender; >>> >>> -- >>> Best Regards >>> >>> Jeff Zhang >>> >>> >> >> >> -- >> [EMAIL PROTECTED] >> >> > > >
+
Amr Awadallah 2010-02-25, 09:25
-
Re: Why no two aggregations can have different DISTINCT columns ?
Mafish Liu 2010-02-25, 10:11
2010/2/25 Zheng Shao <[EMAIL PROTECTED]>: > Yes definitely. Do you want to open a JIRA and post a patch? > Please link the new JIRA to the other 2 JIRA that was mentioned in the > same email thread. I'll open a jira. And the patch will be post after code and documents being arranged.
> Zheng > > On Thu, Feb 25, 2010 at 1:16 AM, Mafish Liu <[EMAIL PROTECTED]> wrote: >> Hive does not support multi-distinct in one query. >> >> We have implemented multi-distinct based on hive 0.4.2rc to our demand. >> We don't know that if Hive is intresting in this feature. >> >> 2010/2/25 Jeff Zhang <[EMAIL PROTECTED]>: >>> >>> Hi all, >>> >>> I read the tutorial of Hive, and it says that "no two aggregations can have >>> different DISTINCT columns". Could anyone tell what is the reason ? Does the >>> following Distinct will been translate to map-reduce job or just do it >>> locally ? >>> >>> INSERT OVERWRITE TABLE pv_gender_agg >>> SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(DISTINCT >>> pv_users.ip) >>> FROM pv_users >>> GROUP BY pv_users.gender; >>> >>> -- >>> Best Regards >>> >>> Jeff Zhang >>> >> >> >> >> -- >> [EMAIL PROTECTED] >> > > > > -- > Yours, > Zheng >
-- [EMAIL PROTECTED]
+
Mafish Liu 2010-02-25, 10:11
-
Re: Why no two aggregations can have different DISTINCT columns ?
Todd Lipcon 2010-02-25, 15:46
I think you can use this existing JIRA: http://issues.apache.org/jira/browse/HIVE-474Thanks -Todd On Thu, Feb 25, 2010 at 2:11 AM, Mafish Liu <[EMAIL PROTECTED]> wrote: > 2010/2/25 Zheng Shao <[EMAIL PROTECTED]>: > > Yes definitely. Do you want to open a JIRA and post a patch? > > Please link the new JIRA to the other 2 JIRA that was mentioned in the > > same email thread. > I'll open a jira. > And the patch will be post after code and documents being arranged. > > > Zheng > > > > On Thu, Feb 25, 2010 at 1:16 AM, Mafish Liu <[EMAIL PROTECTED]> wrote: > >> Hive does not support multi-distinct in one query. > >> > >> We have implemented multi-distinct based on hive 0.4.2rc to our demand. > >> We don't know that if Hive is intresting in this feature. > >> > >> 2010/2/25 Jeff Zhang <[EMAIL PROTECTED]>: > >>> > >>> Hi all, > >>> > >>> I read the tutorial of Hive, and it says that "no two aggregations can > have > >>> different DISTINCT columns". Could anyone tell what is the reason ? > Does the > >>> following Distinct will been translate to map-reduce job or just do it > >>> locally ? > >>> > >>> INSERT OVERWRITE TABLE pv_gender_agg > >>> SELECT pv_users.gender, count(DISTINCT pv_users.userid), > count(DISTINCT > >>> pv_users.ip) > >>> FROM pv_users > >>> GROUP BY pv_users.gender; > >>> > >>> -- > >>> Best Regards > >>> > >>> Jeff Zhang > >>> > >> > >> > >> > >> -- > >> [EMAIL PROTECTED] > >> > > > > > > > > -- > > Yours, > > Zheng > > > > > > -- > [EMAIL PROTECTED] >
+
Todd Lipcon 2010-02-25, 15:46
-
Re: Why no two aggregations can have different DISTINCT columns ?
Mafish Liu 2010-02-26, 04:06
2010/2/25 Todd Lipcon <[EMAIL PROTECTED]>: > I think you can use this existing JIRA: > http://issues.apache.org/jira/browse/HIVE-474I'm using this JIRA. Thanks. > > Thanks > -Todd > On Thu, Feb 25, 2010 at 2:11 AM, Mafish Liu <[EMAIL PROTECTED]> wrote: >> >> 2010/2/25 Zheng Shao <[EMAIL PROTECTED]>: >> > Yes definitely. Do you want to open a JIRA and post a patch? >> > Please link the new JIRA to the other 2 JIRA that was mentioned in the >> > same email thread. >> I'll open a jira. >> And the patch will be post after code and documents being arranged. >> >> > Zheng >> > >> > On Thu, Feb 25, 2010 at 1:16 AM, Mafish Liu <[EMAIL PROTECTED]> wrote: >> >> Hive does not support multi-distinct in one query. >> >> >> >> We have implemented multi-distinct based on hive 0.4.2rc to our demand. >> >> We don't know that if Hive is intresting in this feature. >> >> >> >> 2010/2/25 Jeff Zhang <[EMAIL PROTECTED]>: >> >>> >> >>> Hi all, >> >>> >> >>> I read the tutorial of Hive, and it says that "no two aggregations can >> >>> have >> >>> different DISTINCT columns". Could anyone tell what is the reason ? >> >>> Does the >> >>> following Distinct will been translate to map-reduce job or just do it >> >>> locally ? >> >>> >> >>> INSERT OVERWRITE TABLE pv_gender_agg >> >>> SELECT pv_users.gender, count(DISTINCT pv_users.userid), >> >>> count(DISTINCT >> >>> pv_users.ip) >> >>> FROM pv_users >> >>> GROUP BY pv_users.gender; >> >>> >> >>> -- >> >>> Best Regards >> >>> >> >>> Jeff Zhang >> >>> >> >> >> >> >> >> >> >> -- >> >> [EMAIL PROTECTED] >> >> >> > >> > >> > >> > -- >> > Yours, >> > Zheng >> > >> >> >> >> -- >> [EMAIL PROTECTED] > > -- [EMAIL PROTECTED]
+
Mafish Liu 2010-02-26, 04:06
-
Re: Why no two aggregations can have different DISTINCT columns ?
Mafish Liu 2010-03-30, 09:22
Patch uploaded. Please have a review at https://issues.apache.org/jira/browse/HIVE-4742010/2/26 Mafish Liu <[EMAIL PROTECTED]>: > 2010/2/25 Todd Lipcon <[EMAIL PROTECTED]>: >> I think you can use this existing JIRA: >> http://issues.apache.org/jira/browse/HIVE-474> I'm using this JIRA. Thanks. > >> >> Thanks >> -Todd >> On Thu, Feb 25, 2010 at 2:11 AM, Mafish Liu <[EMAIL PROTECTED]> wrote: >>> >>> 2010/2/25 Zheng Shao <[EMAIL PROTECTED]>: >>> > Yes definitely. Do you want to open a JIRA and post a patch? >>> > Please link the new JIRA to the other 2 JIRA that was mentioned in the >>> > same email thread. >>> I'll open a jira. >>> And the patch will be post after code and documents being arranged. >>> >>> > Zheng >>> > >>> > On Thu, Feb 25, 2010 at 1:16 AM, Mafish Liu <[EMAIL PROTECTED]> wrote: >>> >> Hive does not support multi-distinct in one query. >>> >> >>> >> We have implemented multi-distinct based on hive 0.4.2rc to our demand. >>> >> We don't know that if Hive is intresting in this feature. >>> >> >>> >> 2010/2/25 Jeff Zhang <[EMAIL PROTECTED]>: >>> >>> >>> >>> Hi all, >>> >>> >>> >>> I read the tutorial of Hive, and it says that "no two aggregations can >>> >>> have >>> >>> different DISTINCT columns". Could anyone tell what is the reason ? >>> >>> Does the >>> >>> following Distinct will been translate to map-reduce job or just do it >>> >>> locally ? >>> >>> >>> >>> INSERT OVERWRITE TABLE pv_gender_agg >>> >>> SELECT pv_users.gender, count(DISTINCT pv_users.userid), >>> >>> count(DISTINCT >>> >>> pv_users.ip) >>> >>> FROM pv_users >>> >>> GROUP BY pv_users.gender; >>> >>> >>> >>> -- >>> >>> Best Regards >>> >>> >>> >>> Jeff Zhang >>> >>> >>> >> >>> >> >>> >> >>> >> -- >>> >> [EMAIL PROTECTED] >>> >> >>> > >>> > >>> > >>> > -- >>> > Yours, >>> > Zheng >>> > >>> >>> >>> >>> -- >>> [EMAIL PROTECTED] >> >> > > > > -- > [EMAIL PROTECTED] > -- [EMAIL PROTECTED]
+
Mafish Liu 2010-03-30, 09:22
-
Re: Why no two aggregations can have different DISTINCT columns ?
Mafish Liu 2010-02-25, 09:23
here are our result of multi-distinct:
hive> describe classes; OK name string number string class string Time taken: 0.122 seconds hive> select * from classes; OK 1 11 8 2 22 12 4 212 2 5 232 23 6 22 2 7 22 2 3 333 13 3 33 3 4 133 32 5 33 3 Time taken: 0.154 seconds
hive> select count(distinct name), count(distinct number), class from classes group by class; .... 1 1 12 1 1 13 3 2 2 1 1 23 2 1 3 1 1 32 1 1 8 2010/2/25 Mafish Liu <[EMAIL PROTECTED]>: > Hive does not support multi-distinct in one query. > > We have implemented multi-distinct based on hive 0.4.2rc to our demand. > We don't know that if Hive is intresting in this feature. > > 2010/2/25 Jeff Zhang <[EMAIL PROTECTED]>: >> >> Hi all, >> >> I read the tutorial of Hive, and it says that "no two aggregations can have >> different DISTINCT columns". Could anyone tell what is the reason ? Does the >> following Distinct will been translate to map-reduce job or just do it >> locally ? >> >> INSERT OVERWRITE TABLE pv_gender_agg >> SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(DISTINCT >> pv_users.ip) >> FROM pv_users >> GROUP BY pv_users.gender; >> >> -- >> Best Regards >> >> Jeff Zhang >> > > > > -- > [EMAIL PROTECTED] >
-- [EMAIL PROTECTED]
+
Mafish Liu 2010-02-25, 09:23
|
|