|
Rakesh Setty
2009-07-02, 19:37
Namit Jain
2009-07-02, 20:24
Rakesh Setty
2009-07-02, 20:54
Namit Jain
2009-07-02, 21:23
Rakesh Setty
2009-07-02, 21:46
Amr Awadallah
2009-07-03, 00:53
Namit Jain
2009-07-03, 00:59
Rakesh Setty
2009-07-06, 16:45
Namit Jain
2009-07-06, 17:03
Ashish Thusoo
2009-07-06, 19:16
Rakesh Setty
2009-07-06, 20:19
Namit Jain
2009-07-06, 20:29
Rakesh Setty
2009-07-06, 20:35
Namit Jain
2009-07-06, 21:21
Namit Jain
2009-07-07, 19:01
Namit Jain
2009-07-08, 17:34
|
-
distinct with union allRakesh Setty 2009-07-02, 19:37
Hi,
I have a query like select distinct username from (select user as username from page_views pv union all select name as username from users u) ur; But I see that result is not actually distinct. Am I missing something here? Thanks, Rakesh +
Rakesh Setty 2009-07-02, 19:37
-
RE: distinct with union allNamit Jain 2009-07-02, 20:24
Are you getting duplicate usernames ?
From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 12:37 PM To: [EMAIL PROTECTED] Subject: distinct with union all Hi, I have a query like select distinct username from (select user as username from page_views pv union all select name as username from users u) ur; But I see that result is not actually distinct. Am I missing something here? Thanks, Rakesh +
Namit Jain 2009-07-02, 20:24
-
RE: distinct with union allRakesh Setty 2009-07-02, 20:54
Yes, I am getting duplicate usernames.
________________________________ From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 1:25 PM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Are you getting duplicate usernames ? From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 12:37 PM To: [EMAIL PROTECTED] Subject: distinct with union all Hi, I have a query like select distinct username from (select user as username from page_views pv union all select name as username from users u) ur; But I see that result is not actually distinct. Am I missing something here? Thanks, Rakesh +
Rakesh Setty 2009-07-02, 20:54
-
RE: distinct with union allNamit Jain 2009-07-02, 21:23
Can you do :
explain extended select distinct username from (select user as username from page_views pv union all select name as username from users u) ur; and send the plan ? From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 1:54 PM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Yes, I am getting duplicate usernames. ________________________________ From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 1:25 PM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Are you getting duplicate usernames ? From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 12:37 PM To: [EMAIL PROTECTED] Subject: distinct with union all Hi, I have a query like select distinct username from (select user as username from page_views pv union all select name as username from users u) ur; But I see that result is not actually distinct. Am I missing something here? Thanks, Rakesh +
Namit Jain 2009-07-02, 21:23
-
RE: distinct with union allRakesh Setty 2009-07-02, 21:46
Hi Namit,
This is the plan generated. hive> explain extended > > select distinct username from (select user as username from page_views pv union all select name as username from users u) ur; OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_UNION (TOK_QUERY (TOK_FROM (TOK_TABREF page_views pv)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_COLREF user) username)))) (TOK_QUERY (TOK_FROM (TOK_TABREF users u)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_COLREF name) username))))) ur)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECTDI (TOK_SELEXPR (TOK_COLREF username))))) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: null-subquery2:ur-subquery2:u Select Operator expressions: expr: name type: string Select Operator expressions: expr: 0 type: string Union Group By Operator keys: expr: 0 type: string mode: hash Reduce Output Operator key expressions: expr: 0 type: string sort order: + Map-reduce partition columns: expr: 0 type: string tag: -1 null-subquery1:ur-subquery1:pv Select Operator expressions: expr: user type: string Select Operator expressions: expr: 0 type: string Union Group By Operator keys: expr: 0 type: string mode: hash Reduce Output Operator key expressions: expr: 0 type: string sort order: + Map-reduce partition columns: expr: 0 type: string tag: -1 Needs Tagging: false Path -> Alias: /user/serakesh/users_hive /user/serakesh/page_views_hive Path -> Partition: /user/serakesh/users_hive Partition input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat properties: line.delim name users field.delim columns.types string:string:string:string:string:int serialization.ddl struct users { string name, string phone, string address, string city, string state, i32 zip} columns name,phone,address,city,state,zip serialization.format bucket_count -1 serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe file.inputformat org.apache.hadoop.mapred.TextInputFormat file.outputformat org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat location /user/serakesh/users_hive serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: users /user/serakesh/page_views_hive Partition input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat properties: line.delim name page_views field.delim columns.types string:int:int:string:bigint:bigint:double serialization.ddl struct page_views { string user, i32 action, i32 timespent, string query_term, i64 ip_addr, i64 time_stamp, double estimated_revenue} columns user,action,timespent,query_term,ip_addr,time_stamp,estimated_revenue serialization.format bucket_count -1 serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe file.inputformat org.apache.hadoop.mapred.TextInputFormat file.outputformat org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat location /user/serakesh/page_views_hive serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: page_views Reduce Operator Tree: Group By Operator keys: expr: KEY.0 type: string mode: mergepartial Select Operator expressions: expr: 0 type: string File Output Operator compressed: false GlobalTableId: 0 directory: /tmp/hive-serakesh/135690254.10001.insclause-0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat properties: columns username serialization.format 1 Stage: Stage-0 Fetch Operator limit: -1 Thanks, Rakesh ________________________________ From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 2:24 PM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Can you do : explain extended select distinct username from (select user as username from page_views pv union all select name as username from users u) ur; and send the plan ? From: Rakesh Setty +
Rakesh Setty 2009-07-02, 21:46
-
Re: distinct with union allAmr Awadallah 2009-07-03, 00:53
make sure you don't have any leading or trailing spaces (or special characters) for the usernames being extracted. also to debug, try to do a select username, count(1) then group by username -- amr Rakesh Setty wrote: > > Yes, I am getting duplicate usernames. > > > > > > ------------------------------------------------------------------------ > > *From:* Namit Jain [mailto:[EMAIL PROTECTED]] > *Sent:* Thursday, July 02, 2009 1:25 PM > *To:* [EMAIL PROTECTED] > *Subject:* RE: distinct with union all > > > > Are you getting duplicate usernames ? > > > > > > *From:* Rakesh Setty [mailto:[EMAIL PROTECTED]] > *Sent:* Thursday, July 02, 2009 12:37 PM > *To:* [EMAIL PROTECTED] > *Subject:* distinct with union all > > > > Hi, > > > > I have a query like > > > > select distinct username from (select user as username from page_views > pv union all select name as username from users u) ur; > > > > But I see that result is not actually distinct. Am I missing something > here? > > > > Thanks, > > Rakesh > +
Amr Awadallah 2009-07-03, 00:53
-
RE: distinct with union allNamit Jain 2009-07-03, 00:59
Your plan looks OK.
As Amr said, try to find the offending users. From: Amr Awadallah [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 5:53 PM To: [EMAIL PROTECTED] Subject: Re: distinct with union all make sure you don't have any leading or trailing spaces (or special characters) for the usernames being extracted. also to debug, try to do a select username, count(1) then group by username -- amr Rakesh Setty wrote: Yes, I am getting duplicate usernames. ________________________________ From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 1:25 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: RE: distinct with union all Are you getting duplicate usernames ? From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 12:37 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: distinct with union all Hi, I have a query like select distinct username from (select user as username from page_views pv union all select name as username from users u) ur; But I see that result is not actually distinct. Am I missing something here? Thanks, Rakesh +
Namit Jain 2009-07-03, 00:59
-
RE: distinct with union allRakesh Setty 2009-07-06, 16:45
Hi Amr,
The group by clause that you suggest shows that grouping is happening as the count column suggests. Perhaps it is not happening across the tables? I see characters like \, [, _, ^, etc in username. Are these considered as special characters? If so, is there any workaround to deal with this? Thanks, Rakesh ________________________________ From: Amr Awadallah [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 5:53 PM To: [EMAIL PROTECTED] Subject: Re: distinct with union all make sure you don't have any leading or trailing spaces (or special characters) for the usernames being extracted. also to debug, try to do a select username, count(1) then group by username -- amr Rakesh Setty wrote: Yes, I am getting duplicate usernames. ________________________________ From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 1:25 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: RE: distinct with union all Are you getting duplicate usernames ? From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 12:37 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: distinct with union all Hi, I have a query like select distinct username from (select user as username from page_views pv union all select name as username from users u) ur; But I see that result is not actually distinct. Am I missing something here? Thanks, Rakesh +
Rakesh Setty 2009-07-06, 16:45
-
RE: distinct with union allNamit Jain 2009-07-06, 17:03
Nothing special for these characters.
select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username; should return the counts. When you see the duplicate usernames, can you send them - From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 9:45 AM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Hi Amr, The group by clause that you suggest shows that grouping is happening as the count column suggests. Perhaps it is not happening across the tables? I see characters like \, [, _, ^, etc in username. Are these considered as special characters? If so, is there any workaround to deal with this? Thanks, Rakesh ________________________________ From: Amr Awadallah [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 5:53 PM To: [EMAIL PROTECTED] Subject: Re: distinct with union all make sure you don't have any leading or trailing spaces (or special characters) for the usernames being extracted. also to debug, try to do a select username, count(1) then group by username -- amr Rakesh Setty wrote: Yes, I am getting duplicate usernames. ________________________________ From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 1:25 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: RE: distinct with union all Are you getting duplicate usernames ? From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 12:37 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: distinct with union all Hi, I have a query like select distinct username from (select user as username from page_views pv union all select name as username from users u) ur; But I see that result is not actually distinct. Am I missing something here? Thanks, Rakesh +
Namit Jain 2009-07-06, 17:03
-
RE: distinct with union allAshish Thusoo 2009-07-06, 19:16
Can you also send the explain plan outputs for the distinct query?
Ashish ________________________________ From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 10:03 AM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Nothing special for these characters. select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username; should return the counts. When you see the duplicate usernames, can you send them - From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 9:45 AM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Hi Amr, The group by clause that you suggest shows that grouping is happening as the count column suggests. Perhaps it is not happening across the tables? I see characters like \, [, _, ^, etc in username. Are these considered as special characters? If so, is there any workaround to deal with this? Thanks, Rakesh ________________________________ From: Amr Awadallah [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 5:53 PM To: [EMAIL PROTECTED] Subject: Re: distinct with union all make sure you don't have any leading or trailing spaces (or special characters) for the usernames being extracted. also to debug, try to do a select username, count(1) then group by username -- amr Rakesh Setty wrote: Yes, I am getting duplicate usernames. ________________________________ From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 1:25 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: RE: distinct with union all Are you getting duplicate usernames ? From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 12:37 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: distinct with union all Hi, I have a query like select distinct username from (select user as username from page_views pv union all select name as username from users u) ur; But I see that result is not actually distinct. Am I missing something here? Thanks, Rakesh +
Ashish Thusoo 2009-07-06, 19:16
-
RE: distinct with union allRakesh Setty 2009-07-06, 20:19
Hi,
>From the counts, I could see that grouping was happening. The output is too big, so I am attaching just partial output here AAAGd`RH_HaA`AZNgA_WDeq 2 AAAZTMIxhpSjx[Oyb 1 AABIxQR_Un]DLGXNwla\`o 3 AAB[_pudLZeuOe 7 AAClxPCotyLn\qC 2 AAEdZeHVNj\[XECl^Amin 3 AAEkewjwSCDfCgmPbxkQEG` 2 AAFQxFnlnImcPKpvQMM 1 AAGbQojol_EXcPk 1 AAHPZBe^Q`UeMBmlhhcAVV 3 AAHs]A_gvZxfKKCI^\ 2 AAIAGwnFPsrGFxaHYgOAiCHe 2 AAIjiO]r`rkrobp_xRn 1 AAIwuSGOmu_L`YGCoHrmvub` 1 AAJG[RPx\RvEYljsBgDdR 3 AAJ^BFCNpTXt]wOEir[ 1 AAJvlnBfYXNnXulGOHlAeW 2 AAKwheCincrxm_jkbm 2 The number of rows written is varying between the queries insert overwrite table Tbl1 select distinct username from (select user as username from page_views pv union all select name as username from users u) ur; and insert overwrite table Tbl1Debug select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username; The query select count(distinct user) from Tb1; returns different value compared to the number of rows created by the first query. It is also different from the query select count(distinct user) from Tb1Debug; @Ashish - I had attached the explain plan output in an earlier reply which Namit said looks fine. Please let me know what could be the problem. Thanks, Rakesh ________________________________ From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 10:03 AM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Nothing special for these characters. select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username; should return the counts. When you see the duplicate usernames, can you send them - From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 9:45 AM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Hi Amr, The group by clause that you suggest shows that grouping is happening as the count column suggests. Perhaps it is not happening across the tables? I see characters like \, [, _, ^, etc in username. Are these considered as special characters? If so, is there any workaround to deal with this? Thanks, Rakesh ________________________________ From: Amr Awadallah [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 5:53 PM To: [EMAIL PROTECTED] Subject: Re: distinct with union all make sure you don't have any leading or trailing spaces (or special characters) for the usernames being extracted. also to debug, try to do a select username, count(1) then group by username -- amr Rakesh Setty wrote: Yes, I am getting duplicate usernames. ________________________________ From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 1:25 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: RE: distinct with union all Are you getting duplicate usernames ? From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 12:37 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: distinct with union all Hi, I have a query like select distinct username from (select user as username from page_views pv union all select name as username from users u) ur; But I see that result is not actually distinct. Am I missing something here? Thanks, Rakesh +
Rakesh Setty 2009-07-06, 20:19
-
RE: distinct with union allNamit Jain 2009-07-06, 20:29
Can you send the following counts:
select count(distinct user) from Tb1; select count(1) from Tb1; select count(distinct user) from Tb1Debug; select count(1) from Tb1Debug; From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 1:20 PM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Hi, >From the counts, I could see that grouping was happening. The output is too big, so I am attaching just partial output here AAAGd`RH_HaA`AZNgA_WDeq 2 AAAZTMIxhpSjx[Oyb 1 AABIxQR_Un]DLGXNwla\`o 3 AAB[_pudLZeuOe 7 AAClxPCotyLn\qC 2 AAEdZeHVNj\[XECl^Amin 3 AAEkewjwSCDfCgmPbxkQEG` 2 AAFQxFnlnImcPKpvQMM 1 AAGbQojol_EXcPk 1 AAHPZBe^Q`UeMBmlhhcAVV 3 AAHs]A_gvZxfKKCI^\ 2 AAIAGwnFPsrGFxaHYgOAiCHe 2 AAIjiO]r`rkrobp_xRn 1 AAIwuSGOmu_L`YGCoHrmvub` 1 AAJG[RPx\RvEYljsBgDdR 3 AAJ^BFCNpTXt]wOEir[ 1 AAJvlnBfYXNnXulGOHlAeW 2 AAKwheCincrxm_jkbm 2 The number of rows written is varying between the queries insert overwrite table Tbl1 select distinct username from (select user as username from page_views pv union all select name as username from users u) ur; and insert overwrite table Tbl1Debug select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username; The query select count(distinct user) from Tb1; returns different value compared to the number of rows created by the first query. It is also different from the query select count(distinct user) from Tb1Debug; @Ashish - I had attached the explain plan output in an earlier reply which Namit said looks fine. Please let me know what could be the problem. Thanks, Rakesh ________________________________ From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 10:03 AM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Nothing special for these characters. select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username; should return the counts. When you see the duplicate usernames, can you send them - From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 9:45 AM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Hi Amr, The group by clause that you suggest shows that grouping is happening as the count column suggests. Perhaps it is not happening across the tables? I see characters like \, [, _, ^, etc in username. Are these considered as special characters? If so, is there any workaround to deal with this? Thanks, Rakesh ________________________________ From: Amr Awadallah [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 5:53 PM To: [EMAIL PROTECTED] Subject: Re: distinct with union all make sure you don't have any leading or trailing spaces (or special characters) for the usernames being extracted. also to debug, try to do a select username, count(1) then group by username -- amr Rakesh Setty wrote: Yes, I am getting duplicate usernames. ________________________________ From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 1:25 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: RE: distinct with union all Are you getting duplicate usernames ? From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 12:37 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: distinct with union all Hi, I have a query like select distinct username from (select user as username from page_views pv union all select name as username from users u) ur; But I see that result is not actually distinct. Am I missing something here? Thanks, Rakesh +
Namit Jain 2009-07-06, 20:29
-
RE: distinct with union allRakesh Setty 2009-07-06, 20:35
Following are the counts
select count(distinct user) from Tb1; 976272 select count(1) from Tb1; 1642777 select count(distinct user) from Tb1Debug; 653824 select count(1) from Tb1Debug; 1095933 Thanks, Rakesh ________________________________ From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 1:29 PM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Can you send the following counts: select count(distinct user) from Tb1; select count(1) from Tb1; select count(distinct user) from Tb1Debug; select count(1) from Tb1Debug; From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 1:20 PM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Hi, >From the counts, I could see that grouping was happening. The output is too big, so I am attaching just partial output here AAAGd`RH_HaA`AZNgA_WDeq 2 AAAZTMIxhpSjx[Oyb 1 AABIxQR_Un]DLGXNwla\`o 3 AAB[_pudLZeuOe 7 AAClxPCotyLn\qC 2 AAEdZeHVNj\[XECl^Amin 3 AAEkewjwSCDfCgmPbxkQEG` 2 AAFQxFnlnImcPKpvQMM 1 AAGbQojol_EXcPk 1 AAHPZBe^Q`UeMBmlhhcAVV 3 AAHs]A_gvZxfKKCI^\ 2 AAIAGwnFPsrGFxaHYgOAiCHe 2 AAIjiO]r`rkrobp_xRn 1 AAIwuSGOmu_L`YGCoHrmvub` 1 AAJG[RPx\RvEYljsBgDdR 3 AAJ^BFCNpTXt]wOEir[ 1 AAJvlnBfYXNnXulGOHlAeW 2 AAKwheCincrxm_jkbm 2 The number of rows written is varying between the queries insert overwrite table Tbl1 select distinct username from (select user as username from page_views pv union all select name as username from users u) ur; and insert overwrite table Tbl1Debug select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username; The query select count(distinct user) from Tb1; returns different value compared to the number of rows created by the first query. It is also different from the query select count(distinct user) from Tb1Debug; @Ashish - I had attached the explain plan output in an earlier reply which Namit said looks fine. Please let me know what could be the problem. Thanks, Rakesh ________________________________ From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 10:03 AM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Nothing special for these characters. select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username; should return the counts. When you see the duplicate usernames, can you send them - From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 9:45 AM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Hi Amr, The group by clause that you suggest shows that grouping is happening as the count column suggests. Perhaps it is not happening across the tables? I see characters like \, [, _, ^, etc in username. Are these considered as special characters? If so, is there any workaround to deal with this? Thanks, Rakesh ________________________________ From: Amr Awadallah [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 5:53 PM To: [EMAIL PROTECTED] Subject: Re: distinct with union all make sure you don't have any leading or trailing spaces (or special characters) for the usernames being extracted. also to debug, try to do a select username, count(1) then group by username -- amr Rakesh Setty wrote: Yes, I am getting duplicate usernames. ________________________________ From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 1:25 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: RE: distinct with union all Are you getting duplicate usernames ? From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 12:37 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: distinct with union all Hi, I have a query like select distinct username from (select user as username from page_views pv union all select name as username from users u) ur; But I see that result is not actually distinct. Am I missing something here? Thanks, Rakesh +
Rakesh Setty 2009-07-06, 20:35
-
RE: distinct with union allNamit Jain 2009-07-06, 21:21
Similar queries seem to be working for me.
1. Which version of hive are using ? 2. Would it be possible for you to ship the data to us ? Thanks, -namit From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 1:36 PM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Following are the counts select count(distinct user) from Tb1; 976272 select count(1) from Tb1; 1642777 select count(distinct user) from Tb1Debug; 653824 select count(1) from Tb1Debug; 1095933 Thanks, Rakesh ________________________________ From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 1:29 PM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Can you send the following counts: select count(distinct user) from Tb1; select count(1) from Tb1; select count(distinct user) from Tb1Debug; select count(1) from Tb1Debug; From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 1:20 PM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Hi, >From the counts, I could see that grouping was happening. The output is too big, so I am attaching just partial output here AAAGd`RH_HaA`AZNgA_WDeq 2 AAAZTMIxhpSjx[Oyb 1 AABIxQR_Un]DLGXNwla\`o 3 AAB[_pudLZeuOe 7 AAClxPCotyLn\qC 2 AAEdZeHVNj\[XECl^Amin 3 AAEkewjwSCDfCgmPbxkQEG` 2 AAFQxFnlnImcPKpvQMM 1 AAGbQojol_EXcPk 1 AAHPZBe^Q`UeMBmlhhcAVV 3 AAHs]A_gvZxfKKCI^\ 2 AAIAGwnFPsrGFxaHYgOAiCHe 2 AAIjiO]r`rkrobp_xRn 1 AAIwuSGOmu_L`YGCoHrmvub` 1 AAJG[RPx\RvEYljsBgDdR 3 AAJ^BFCNpTXt]wOEir[ 1 AAJvlnBfYXNnXulGOHlAeW 2 AAKwheCincrxm_jkbm 2 The number of rows written is varying between the queries insert overwrite table Tbl1 select distinct username from (select user as username from page_views pv union all select name as username from users u) ur; and insert overwrite table Tbl1Debug select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username; The query select count(distinct user) from Tb1; returns different value compared to the number of rows created by the first query. It is also different from the query select count(distinct user) from Tb1Debug; @Ashish - I had attached the explain plan output in an earlier reply which Namit said looks fine. Please let me know what could be the problem. Thanks, Rakesh ________________________________ From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 10:03 AM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Nothing special for these characters. select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username; should return the counts. When you see the duplicate usernames, can you send them - From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 9:45 AM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Hi Amr, The group by clause that you suggest shows that grouping is happening as the count column suggests. Perhaps it is not happening across the tables? I see characters like \, [, _, ^, etc in username. Are these considered as special characters? If so, is there any workaround to deal with this? Thanks, Rakesh ________________________________ From: Amr Awadallah [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 5:53 PM To: [EMAIL PROTECTED] Subject: Re: distinct with union all make sure you don't have any leading or trailing spaces (or special characters) for the usernames being extracted. also to debug, try to do a select username, count(1) then group by username -- amr Rakesh Setty wrote: Yes, I am getting duplicate usernames. ________________________________ From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 1:25 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: RE: distinct with union all Are you getting duplicate usernames ? From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 12:37 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: distinct with union all Hi, I have a query like select distinct username from (select user as username from page_views pv union all select name as username from users u) ur; But I see that result is not actually distinct. Am I missing something here? Thanks, Rakesh +
Namit Jain 2009-07-06, 21:21
-
RE: distinct with union allNamit Jain 2009-07-07, 19:01
Rakesh,
Did the queries work for you ? Thanks, -namit From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 2:22 PM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Similar queries seem to be working for me. 1. Which version of hive are using ? 2. Would it be possible for you to ship the data to us ? Thanks, -namit From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 1:36 PM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Following are the counts select count(distinct user) from Tb1; 976272 select count(1) from Tb1; 1642777 select count(distinct user) from Tb1Debug; 653824 select count(1) from Tb1Debug; 1095933 Thanks, Rakesh ________________________________ From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 1:29 PM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Can you send the following counts: select count(distinct user) from Tb1; select count(1) from Tb1; select count(distinct user) from Tb1Debug; select count(1) from Tb1Debug; From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 1:20 PM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Hi, >From the counts, I could see that grouping was happening. The output is too big, so I am attaching just partial output here AAAGd`RH_HaA`AZNgA_WDeq 2 AAAZTMIxhpSjx[Oyb 1 AABIxQR_Un]DLGXNwla\`o 3 AAB[_pudLZeuOe 7 AAClxPCotyLn\qC 2 AAEdZeHVNj\[XECl^Amin 3 AAEkewjwSCDfCgmPbxkQEG` 2 AAFQxFnlnImcPKpvQMM 1 AAGbQojol_EXcPk 1 AAHPZBe^Q`UeMBmlhhcAVV 3 AAHs]A_gvZxfKKCI^\ 2 AAIAGwnFPsrGFxaHYgOAiCHe 2 AAIjiO]r`rkrobp_xRn 1 AAIwuSGOmu_L`YGCoHrmvub` 1 AAJG[RPx\RvEYljsBgDdR 3 AAJ^BFCNpTXt]wOEir[ 1 AAJvlnBfYXNnXulGOHlAeW 2 AAKwheCincrxm_jkbm 2 The number of rows written is varying between the queries insert overwrite table Tbl1 select distinct username from (select user as username from page_views pv union all select name as username from users u) ur; and insert overwrite table Tbl1Debug select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username; The query select count(distinct user) from Tb1; returns different value compared to the number of rows created by the first query. It is also different from the query select count(distinct user) from Tb1Debug; @Ashish - I had attached the explain plan output in an earlier reply which Namit said looks fine. Please let me know what could be the problem. Thanks, Rakesh ________________________________ From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 10:03 AM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Nothing special for these characters. select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username; should return the counts. When you see the duplicate usernames, can you send them - From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 9:45 AM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Hi Amr, The group by clause that you suggest shows that grouping is happening as the count column suggests. Perhaps it is not happening across the tables? I see characters like \, [, _, ^, etc in username. Are these considered as special characters? If so, is there any workaround to deal with this? Thanks, Rakesh ________________________________ From: Amr Awadallah [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 5:53 PM To: [EMAIL PROTECTED] Subject: Re: distinct with union all make sure you don't have any leading or trailing spaces (or special characters) for the usernames being extracted. also to debug, try to do a select username, count(1) then group by username -- amr Rakesh Setty wrote: Yes, I am getting duplicate usernames. ________________________________ From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 1:25 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: RE: distinct with union all Are you getting duplicate usernames ? From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 12:37 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: distinct with union all Hi, I have a query like select distinct username from (select user as username from page_views pv union all select name as username from users u) ur; But I see that result is not actually distinct. Am I missing something here? Thanks, Rakesh +
Namit Jain 2009-07-07, 19:01
-
RE: distinct with union allNamit Jain 2009-07-08, 17:34
Can you tell the exact command to generate the data ?
Detailed instructions. Do I need to install pig, or is the standalone perl file good enough ? Thanks, -namit [njain@dev029 ~/pigmix]$ perl generate_data.pl Usage: generate_data.pl filetype numrows tablename [nosql] Valid filetypes [studenttab, studentcolon, studentusrdef, votertab, reg1459894, textdoc, unicode] Died at generate_data.pl line 145. [njain@dev029 ~/pigmix]$ From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Tuesday, July 07, 2009 1:23 PM To: Namit Jain Subject: RE: distinct with union all Hi Namit, The query that I am working on is test L11 in PigMix converted to Hive. To generate the data, you can use the patch attached to https://issues.apache.org/jira/browse/PIG-200. The exact Hive query I am using is insert overwrite table L11out select distinct username from (select field1 as username from widerow w union all select user as username from page_views pv) ur; The output says 1642777 rows are loaded. If I do select count(distinct user) from L11out I get 652384 as the output. Please let me know if you have any other questions. Thanks, Rakesh ________________________________ From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Tuesday, July 07, 2009 12:01 PM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Rakesh, Did the queries work for you ? Thanks, -namit From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 2:22 PM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Similar queries seem to be working for me. 1. Which version of hive are using ? 2. Would it be possible for you to ship the data to us ? Thanks, -namit From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 1:36 PM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Following are the counts select count(distinct user) from Tb1; 976272 select count(1) from Tb1; 1642777 select count(distinct user) from Tb1Debug; 653824 select count(1) from Tb1Debug; 1095933 Thanks, Rakesh ________________________________ From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 1:29 PM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Can you send the following counts: select count(distinct user) from Tb1; select count(1) from Tb1; select count(distinct user) from Tb1Debug; select count(1) from Tb1Debug; From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 1:20 PM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Hi, >From the counts, I could see that grouping was happening. The output is too big, so I am attaching just partial output here AAAGd`RH_HaA`AZNgA_WDeq 2 AAAZTMIxhpSjx[Oyb 1 AABIxQR_Un]DLGXNwla\`o 3 AAB[_pudLZeuOe 7 AAClxPCotyLn\qC 2 AAEdZeHVNj\[XECl^Amin 3 AAEkewjwSCDfCgmPbxkQEG` 2 AAFQxFnlnImcPKpvQMM 1 AAGbQojol_EXcPk 1 AAHPZBe^Q`UeMBmlhhcAVV 3 AAHs]A_gvZxfKKCI^\ 2 AAIAGwnFPsrGFxaHYgOAiCHe 2 AAIjiO]r`rkrobp_xRn 1 AAIwuSGOmu_L`YGCoHrmvub` 1 AAJG[RPx\RvEYljsBgDdR 3 AAJ^BFCNpTXt]wOEir[ 1 AAJvlnBfYXNnXulGOHlAeW 2 AAKwheCincrxm_jkbm 2 The number of rows written is varying between the queries insert overwrite table Tbl1 select distinct username from (select user as username from page_views pv union all select name as username from users u) ur; and insert overwrite table Tbl1Debug select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username; The query select count(distinct user) from Tb1; returns different value compared to the number of rows created by the first query. It is also different from the query select count(distinct user) from Tb1Debug; @Ashish - I had attached the explain plan output in an earlier reply which Namit said looks fine. Please let me know what could be the problem. Thanks, Rakesh ________________________________ From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 10:03 AM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Nothing special for these characters. select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username; should return the counts. When you see the duplicate usernames, can you send them - From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Monday, July 06, 2009 9:45 AM To: [EMAIL PROTECTED] Subject: RE: distinct with union all Hi Amr, The group by clause that you suggest shows that grouping is happening as the count column suggests. Perhaps it is not happening across the tables? I see characters like \, [, _, ^, etc in username. Are these considered as special characters? If so, is there any workaround to deal with this? Thanks, Rakesh ________________________________ From: Amr Awadallah [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 5:53 PM To: [EMAIL PROTECTED] Subject: Re: distinct with union all make sure you don't have any leading or trailing spaces (or special characters) for the usernames being extracted. also to debug, try to do a select username, count(1) then group by username Rakesh Setty wrote: Yes, I am getting duplicate usernames. ________________________________ From: Namit Jain [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 1:25 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: RE: distinct with union all Are you getting duplicate usernames ? From: Rakesh Setty [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 02, 2009 12:37 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: distinct with union all Hi, I have a query like select distinct username from (select user as username from page_views pv union all select name as u +
Namit Jain 2009-07-08, 17:34
|