|
Matt Pestritto
2009-04-22, 22:22
Amr Awadallah
2009-04-22, 22:40
Matt Pestritto
2009-04-22, 22:43
Matt Pestritto
2009-04-22, 22:46
Prasad Chakka
2009-04-22, 22:49
Matt Pestritto
2009-04-22, 22:53
Ashish Thusoo
2009-04-23, 00:09
Zheng Shao
2009-04-23, 01:35
Jeff Hammerbacher
2009-04-23, 03:22
Matt Pestritto
2009-04-23, 03:38
Zheng Shao
2009-04-23, 08:39
|
-
Aggregrate Query Fails.Matt Pestritto 2009-04-22, 22:22
Hi - I'm having a problem with a query below. When I try to run any
aggregate function on a column from the sub-query, the job fails. The queries and output messages are below. Suggestions? thanks in advance. -- works: 2 map-reduces jobs. select m.description, o_buyers.num as buyers from clickstream_output o join merchandise m on (o.merchandise_id = m.merchandise_id) left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id) as num from clickstream_output o1 where o1.file_date >= '20090216' and o1.file_date <= '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers on (o_buyers.merchandise_id = o.merchandise_id) where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40; Successful output: PROD7362, 3 -- fails: 3 map-reduce jobs - 2nd reduce fails. select m.description, o_buyers.num as buyers*, count(1) as total* -- sum or max(o_buyers.num) and removing from group by also fails. from clickstream_output o join merchandise m on (o.merchandise_id = m.merchandise_id) left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id) as num from clickstream_output o1 where o1.file_date >= '20090216' and o1.file_date <= '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers on (o_buyers.merchandise_id = o.merchandise_id) where o.file_date >= '20090216' and o.file_date <= '20090217' *group by m.description, o_buyers.num* limit 40 ; java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353) at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353) at org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505) at org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528) at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)
-
Re: Aggregrate Query Fails.Amr Awadallah 2009-04-22, 22:40
in the group by, try this instead: *group by m.description, buyers* limit 40 ; Matt Pestritto wrote: > Hi - I'm having a problem with a query below. When I try to run any > aggregate function on a column from the sub-query, the job fails. > The queries and output messages are below. > > Suggestions? > > thanks in advance. > > -- works: 2 map-reduces jobs. > select m.description, o_buyers.num as buyers > from clickstream_output o > join merchandise m on (o.merchandise_id = m.merchandise_id) > left outer join ( select o1.merchandise_id, count(distinct > o1.consumer_id) as num from clickstream_output o1 > where o1.file_date >= '20090216' and o1.file_date <= > '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers > on (o_buyers.merchandise_id = o.merchandise_id) > where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40; > > Successful output: PROD7362, 3 > > -- fails: 3 map-reduce jobs - 2nd reduce fails. > select m.description, o_buyers.num as buyers*, count(1) as total* -- > sum or max(o_buyers.num) and removing from group by also fails. > from clickstream_output o > join merchandise m on (o.merchandise_id = m.merchandise_id) > left outer join ( select o1.merchandise_id, count(distinct > o1.consumer_id) as num from clickstream_output o1 > where o1.file_date >= '20090216' and o1.file_date <= > '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers > on (o_buyers.merchandise_id = o.merchandise_id) > where o.file_date >= '20090216' and o.file_date <= '20090217' > *group by m.description, o_buyers.num* limit 40 ; > > > java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text > > at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391) > at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122) > > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text > at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489) > > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353) > at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353) > > at org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273) > at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516) > at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505) > > at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505) > at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505) > at org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546) > > at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528) > at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143) > >
-
Re: Aggregrate Query Fails.Matt Pestritto 2009-04-22, 22:43
Thanks.
I tried that also earlier: FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference buyers On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <[EMAIL PROTECTED]> wrote: > > in the group by, try this instead: > > *group by m.description, buyers* limit 40 ; > > Matt Pestritto wrote: > > Hi - I'm having a problem with a query below. When I try to run any > aggregate function on a column from the sub-query, the job fails. > The queries and output messages are below. > > Suggestions? > > thanks in advance. > > -- works: 2 map-reduces jobs. > select m.description, o_buyers.num as buyers > from clickstream_output o > join merchandise m on (o.merchandise_id = m.merchandise_id) > left outer join ( select o1.merchandise_id, count(distinct > o1.consumer_id) as num from clickstream_output o1 > where o1.file_date >= '20090216' and o1.file_date <= '20090217' > and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers > on (o_buyers.merchandise_id = o.merchandise_id) > where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40; > > Successful output: PROD7362, 3 > > -- fails: 3 map-reduce jobs - 2nd reduce fails. > select m.description, o_buyers.num as buyers*, count(1) as total* -- sum > or max(o_buyers.num) and removing from group by also fails. > from clickstream_output o > join merchandise m on (o.merchandise_id = m.merchandise_id) > left outer join ( select o1.merchandise_id, count(distinct > o1.consumer_id) as num from clickstream_output o1 > where o1.file_date >= '20090216' and o1.file_date <= '20090217' > and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers > on (o_buyers.merchandise_id = o.merchandise_id) > where o.file_date >= '20090216' and o.file_date <= '20090217' > *group by m.description, o_buyers.num* limit 40 ; > > > java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text > > at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391) > at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122) > > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text > at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489) > > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353) > at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353) > > at org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273) > at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516) > at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505) > > at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505) > at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505) > at org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546) > > at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528) > at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143) > > > >
-
Re: Aggregrate Query Fails.Matt Pestritto 2009-04-22, 22:46
Taking a look at the explain, the first stage of both queries are
identical. The only differences are in the reduce in the 2nd stage. I'm not sure if this helps or not. Thanks. Working Reduce: Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 Left Outer Join0 to 2 condition expressions: 0 {VALUE.0} {VALUE.1} 1 {VALUE.0} {VALUE.1} 2 {VALUE.0} {VALUE.1} Filter Operator predicate: expr: ((1 >= '20090216') and (1 <= '20090217')) type: boolean Select Operator expressions: expr: 3 type: string expr: 5 type: bigint Limit File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Failed Reduce: Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 Left Outer Join0 to 2 condition expressions: 0 {VALUE.0} {VALUE.1} 1 {VALUE.0} {VALUE.1} 2 {VALUE.0} {VALUE.1} Filter Operator predicate: expr: ((1 >= '20090216') and (1 <= '20090217')) type: boolean Group By Operator aggregations: expr: count(1) keys: expr: 3 type: string expr: 5 type: bigint mode: hash File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat name: binary_table On Wed, Apr 22, 2009 at 6:43 PM, Matt Pestritto <[EMAIL PROTECTED]> wrote: > Thanks. > I tried that also earlier: > FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference > buyers > > > On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <[EMAIL PROTECTED]> wrote: > >> >> in the group by, try this instead: >> >> *group by m.description, buyers* limit 40 ; >> >> Matt Pestritto wrote: >> >> Hi - I'm having a problem with a query below. When I try to run any >> aggregate function on a column from the sub-query, the job fails. >> The queries and output messages are below. >> >> Suggestions? >> >> thanks in advance. >> >> -- works: 2 map-reduces jobs. >> select m.description, o_buyers.num as buyers >> from clickstream_output o >> join merchandise m on (o.merchandise_id = m.merchandise_id) >> left outer join ( select o1.merchandise_id, count(distinct >> o1.consumer_id) as num from clickstream_output o1 >> where o1.file_date >= '20090216' and o1.file_date <= '20090217' >> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers >> on (o_buyers.merchandise_id = o.merchandise_id) >> where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40; >> >> Successful output: PROD7362, 3 >> >> -- fails: 3 map-reduce jobs - 2nd reduce fails. >> select m.description, o_buyers.num as buyers*, count(1) as total* -- >> sum or max(o_buyers.num) and removing from group by also fails. >> from clickstream_output o >> join merchandise m on (o.merchandise_id = m.merchandise_id) >> left outer join ( select o1.merchandise_id, count(distinct >> o1.consumer_id) as num from clickstream_output o1 >> where o1.file_date >= '20090216' and o1.file_date <= '20090217' >> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers >> on (o_buyers.merchandise_id = o.merchandise_id)
-
Re: Aggregrate Query Fails.Prasad Chakka 2009-04-22, 22:49
That is strange... Does below also fail?
select m.description, o_buyers.num , count(1) as total from clickstream_output o join merchandise m on (o.merchandise_id = m.merchandise_id) left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id) as num from clickstream_output o1 where o1.file_date >= '20090216' and o1.file_date <= '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers on (o_buyers.merchandise_id = o.merchandise_id) where o.file_date >= '20090216' and o.file_date <= '20090217' group by m.description, o_buyers.num limit 40 ; ________________________________ From: Matt Pestritto <[EMAIL PROTECTED]> Reply-To: <[EMAIL PROTECTED]> Date: Wed, 22 Apr 2009 15:43:40 -0700 To: <[EMAIL PROTECTED]> Subject: Re: Aggregrate Query Fails. Thanks. I tried that also earlier: FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference buyers On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <[EMAIL PROTECTED]> wrote: in the group by, try this instead: group by m.description, buyers limit 40 ; Matt Pestritto wrote: Hi - I'm having a problem with a query below. When I try to run any aggregate function on a column from the sub-query, the job fails. The queries and output messages are below. Suggestions? thanks in advance. -- works: 2 map-reduces jobs. select m.description, o_buyers.num as buyers from clickstream_output o join merchandise m on (o.merchandise_id = m.merchandise_id) left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id) as num from clickstream_output o1 where o1.file_date >= '20090216' and o1.file_date <= '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers on (o_buyers.merchandise_id = o.merchandise_id) where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40; Successful output: PROD7362, 3 -- fails: 3 map-reduce jobs - 2nd reduce fails. select m.description, o_buyers.num as buyers, count(1) as total -- sum or max(o_buyers.num) and removing from group by also fails. from clickstream_output o join merchandise m on (o.merchandise_id = m.merchandise_id) left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id) as num from clickstream_output o1 where o1.file_date >= '20090216' and o1.file_date <= '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers on (o_buyers.merchandise_id = o.merchandise_id) where o.file_date >= '20090216' and o.file_date <= '20090217' group by m.description, o_buyers.num limit 40 ; java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353) at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353) at org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505) at org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528) at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)
-
Re: Aggregrate Query Fails.Matt Pestritto 2009-04-22, 22:53
So the only change was to remove the column alias correct?
Still no luck. Same result. On Wed, Apr 22, 2009 at 6:49 PM, Prasad Chakka <[EMAIL PROTECTED]> wrote: > That is strange... Does below also fail? > > select m.description,* o_buyers.num ,* count(1) as total > > from clickstream_output o > join merchandise m on (o.merchandise_id = m.merchandise_id) > left outer join ( select o1.merchandise_id, count(distinct > o1.consumer_id) as num from clickstream_output o1 > where o1.file_date >= '20090216' and o1.file_date <= '20090217' > and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers > on (o_buyers.merchandise_id = o.merchandise_id) > where o.file_date >= '20090216' and o.file_date <= '20090217' > > group by m.description, o_buyers.num limit 40 ; > > > ------------------------------ > *From: *Matt Pestritto <[EMAIL PROTECTED]> > *Reply-To: *<[EMAIL PROTECTED]> > *Date: *Wed, 22 Apr 2009 15:43:40 -0700 > *To: *<[EMAIL PROTECTED]> > *Subject: *Re: Aggregrate Query Fails. > > > Thanks. > I tried that also earlier: > FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference > buyers > > On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <[EMAIL PROTECTED]> wrote: > > > > in the group by, try this instead: > > *group by m.description, buyers* limit 40 ; > > Matt Pestritto wrote: > > Hi - I'm having a problem with a query below. When I try to run any > aggregate function on a column from the sub-query, the job fails. > The queries and output messages are below. > > Suggestions? > > thanks in advance. > > -- works: 2 map-reduces jobs. > select m.description, o_buyers.num as buyers > from clickstream_output o > join merchandise m on (o.merchandise_id = m.merchandise_id) > left outer join ( select o1.merchandise_id, count(distinct > o1.consumer_id) as num from clickstream_output o1 > where o1.file_date >= '20090216' and o1.file_date <= '20090217' > and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers > on (o_buyers.merchandise_id = o.merchandise_id) > where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40; > > Successful output: PROD7362, 3 > > -- fails: 3 map-reduce jobs - 2nd reduce fails. > select m.description, o_buyers.num as buyers*, count(1) as total* -- sum > or max(o_buyers.num) and removing from group by also fails. > from clickstream_output o > join merchandise m on (o.merchandise_id = m.merchandise_id) > left outer join ( select o1.merchandise_id, count(distinct > o1.consumer_id) as num from clickstream_output o1 > where o1.file_date >= '20090216' and o1.file_date <= '20090217' > and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers > on (o_buyers.merchandise_id = o.merchandise_id) > where o.file_date >= '20090216' and o.file_date <= '20090217' > *group by m.description, o_buyers.num* limit 40 ; > > > > java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be > cast to org.apache.hadoop.io.Text > > at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391) > at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122) > > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be > cast to org.apache.hadoop.io.Text > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489) > > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353) > at > org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353) > > at > org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273) > at > org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516) > at > org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
-
RE: Aggregrate Query Fails.Ashish Thusoo 2009-04-23, 00:09
Can you do an explain <query> and send us the plan.
Another thing that you may try is to put the entire subquery in the from clause and then do an aggregation on it.. i.e. select ..., count(1) from (select ... from clickstream_output o join .....) t group by t.description, t.num limit 40; Also are you using the 0.3.0 release candidate? Ashish ________________________________________ From: Matt Pestritto [[EMAIL PROTECTED]] Sent: Wednesday, April 22, 2009 3:53 PM To: [EMAIL PROTECTED] Subject: Re: Aggregrate Query Fails. So the only change was to remove the column alias correct? Still no luck. Same result. On Wed, Apr 22, 2009 at 6:49 PM, Prasad Chakka <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: That is strange... Does below also fail? select m.description, o_buyers.num , count(1) as total from clickstream_output o join merchandise m on (o.merchandise_id = m.merchandise_id) left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id) as num from clickstream_output o1 where o1.file_date >= '20090216' and o1.file_date <= '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers on (o_buyers.merchandise_id = o.merchandise_id) where o.file_date >= '20090216' and o.file_date <= '20090217' group by m.description, o_buyers.num limit 40 ; ________________________________ From: Matt Pestritto <[EMAIL PROTECTED]<http://[EMAIL PROTECTED]>> Reply-To: <[EMAIL PROTECTED]<http://[EMAIL PROTECTED]>> Date: Wed, 22 Apr 2009 15:43:40 -0700 To: <[EMAIL PROTECTED]<http://[EMAIL PROTECTED]>> Subject: Re: Aggregrate Query Fails. Thanks. I tried that also earlier: FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference buyers On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <[EMAIL PROTECTED]<http://[EMAIL PROTECTED]>> wrote: in the group by, try this instead: group by m.description, buyers limit 40 ; Matt Pestritto wrote: Hi - I'm having a problem with a query below. When I try to run any aggregate function on a column from the sub-query, the job fails. The queries and output messages are below. Suggestions? thanks in advance. -- works: 2 map-reduces jobs. select m.description, o_buyers.num as buyers from clickstream_output o join merchandise m on (o.merchandise_id = m.merchandise_id) left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id) as num from clickstream_output o1 where o1.file_date >= '20090216' and o1.file_date <= '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers on (o_buyers.merchandise_id = o.merchandise_id) where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40; Successful output: PROD7362, 3 -- fails: 3 map-reduce jobs - 2nd reduce fails. select m.description, o_buyers.num as buyers, count(1) as total -- sum or max(o_buyers.num) and removing from group by also fails. from clickstream_output o join merchandise m on (o.merchandise_id = m.merchandise_id) left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id) as num from clickstream_output o1 where o1.file_date >= '20090216' and o1.file_date <= '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers on (o_buyers.merchandise_id = o.merchandise_id) where o.file_date >= '20090216' and o.file_date <= '20090217' group by m.description, o_buyers.num limit 40 ; java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353) at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353) at org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505) at org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528) at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)
-
Re: Aggregrate Query Fails.Zheng Shao 2009-04-23, 01:35
Hi Matt,
There is a known bug with JOIN - all output columns from JOIN will become STRING. The stacktrace shows this is exactly because of that bug. The workaround right now is to replace all columns with CAST(xxx as STRING), if the column type is not a STRING. select m.description, o_buyers.num as buyers*, count(1) as total* from clickstream_output o join merchandise m on (CAST(o.merchandise_id AS STRING) CAST(m.merchandise_id AS STRING)) left outer join ( select CAST(o1.merchandise_id AS STRING) as merchandise_id, CAST(count(distinct o1.consumer_id) AS STRING) as num from clickstream_output o1 where o1.file_date >= '20090216' and o1.file_date <= '20090217' and o1.event_id = 'buy' group by CAST(o1.merchandise_id AS STRING) ) o_buyers on (CAST(o_buyers.merchandise_id AS STRING) = CAST(o.merchandise_id AS STRING)) where o.file_date >= '20090216' and o.file_date <= '20090217' *group by m.description, o_buyers.num* limit 40 ; It will be fixed by https://issues.apache.org/jira/browse/HIVE-405. Zheng On Wed, Apr 22, 2009 at 5:09 PM, Ashish Thusoo <[EMAIL PROTECTED]> wrote: > Can you do an explain <query> and send us the plan. > > Another thing that you may try is to put the entire subquery in the from > clause and then do an aggregation on it.. > > i.e. > > select ..., count(1) > from (select ... > from clickstream_output o join .....) t > group by t.description, t.num limit 40; > > Also are you using the 0.3.0 release candidate? > > Ashish > > ________________________________________ > From: Matt Pestritto [[EMAIL PROTECTED]] > Sent: Wednesday, April 22, 2009 3:53 PM > To: [EMAIL PROTECTED] > Subject: Re: Aggregrate Query Fails. > > So the only change was to remove the column alias correct? > Still no luck. Same result. > > On Wed, Apr 22, 2009 at 6:49 PM, Prasad Chakka <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > That is strange... Does below also fail? > > select m.description, o_buyers.num , count(1) as total > from clickstream_output o > join merchandise m on (o.merchandise_id = m.merchandise_id) > left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id) > as num from clickstream_output o1 > where o1.file_date >= '20090216' and o1.file_date <= '20090217' and > o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers > on (o_buyers.merchandise_id = o.merchandise_id) > where o.file_date >= '20090216' and o.file_date <= '20090217' > group by m.description, o_buyers.num limit 40 ; > > > ________________________________ > From: Matt Pestritto <[EMAIL PROTECTED]<http://[EMAIL PROTECTED]>> > Reply-To: <[EMAIL PROTECTED]<http://[EMAIL PROTECTED] > >> > Date: Wed, 22 Apr 2009 15:43:40 -0700 > To: <[EMAIL PROTECTED]<http://[EMAIL PROTECTED]>> > Subject: Re: Aggregrate Query Fails. > > > Thanks. > I tried that also earlier: > FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference > buyers > > On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <[EMAIL PROTECTED]< > http://[EMAIL PROTECTED]>> wrote: > > > in the group by, try this instead: > > group by m.description, buyers limit 40 ; > > Matt Pestritto wrote: > Hi - I'm having a problem with a query below. When I try to run any > aggregate function on a column from the sub-query, the job fails. > The queries and output messages are below. > > Suggestions? > > thanks in advance. > > -- works: 2 map-reduces jobs. > select m.description, o_buyers.num as buyers > from clickstream_output o > join merchandise m on (o.merchandise_id = m.merchandise_id) > left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id) > as num from clickstream_output o1 > where o1.file_date >= '20090216' and o1.file_date <= '20090217' and > o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers > on (o_buyers.merchandise_id = o.merchandise_id) > where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40; > > Successful output: PROD7362, 3 Yours, Zheng
-
Re: Aggregrate Query Fails.Jeff Hammerbacher 2009-04-23, 03:22
Hey Zheng,
Thanks for the insight. Perhaps these sorts of quirks could be added to the documentation on the wiki? Thanks, Jeff On Wed, Apr 22, 2009 at 6:35 PM, Zheng Shao <[EMAIL PROTECTED]> wrote: > Hi Matt, > > There is a known bug with JOIN - all output columns from JOIN will become > STRING. > The stacktrace shows this is exactly because of that bug. > > The workaround right now is to replace all columns with CAST(xxx as > STRING), if the column type is not a STRING. > > > select m.description, o_buyers.num as buyers*, count(1) as total* > from clickstream_output o > join merchandise m on (CAST(o.merchandise_id AS STRING) > CAST(m.merchandise_id AS STRING)) > left outer join ( select CAST(o1.merchandise_id AS STRING) as > merchandise_id, CAST(count(distinct o1.consumer_id) AS STRING) as num from > clickstream_output o1 > where o1.file_date >= '20090216' and o1.file_date <= '20090217' > and o1.event_id = 'buy' group by CAST(o1.merchandise_id AS STRING) ) > o_buyers > on (CAST(o_buyers.merchandise_id AS STRING) = CAST(o.merchandise_id AS > STRING)) > where o.file_date >= '20090216' and o.file_date <= '20090217' > *group by m.description, o_buyers.num* limit 40 ; > > > It will be fixed by https://issues.apache.org/jira/browse/HIVE-405. > > Zheng > > > On Wed, Apr 22, 2009 at 5:09 PM, Ashish Thusoo <[EMAIL PROTECTED]>wrote: > >> Can you do an explain <query> and send us the plan. >> >> Another thing that you may try is to put the entire subquery in the from >> clause and then do an aggregation on it.. >> >> i.e. >> >> select ..., count(1) >> from (select ... >> from clickstream_output o join .....) t >> group by t.description, t.num limit 40; >> >> Also are you using the 0.3.0 release candidate? >> >> Ashish >> >> ________________________________________ >> From: Matt Pestritto [[EMAIL PROTECTED]] >> Sent: Wednesday, April 22, 2009 3:53 PM >> To: [EMAIL PROTECTED] >> Subject: Re: Aggregrate Query Fails. >> >> So the only change was to remove the column alias correct? >> Still no luck. Same result. >> >> On Wed, Apr 22, 2009 at 6:49 PM, Prasad Chakka <[EMAIL PROTECTED] >> <mailto:[EMAIL PROTECTED]>> wrote: >> That is strange... Does below also fail? >> >> select m.description, o_buyers.num , count(1) as total >> from clickstream_output o >> join merchandise m on (o.merchandise_id = m.merchandise_id) >> left outer join ( select o1.merchandise_id, count(distinct >> o1.consumer_id) as num from clickstream_output o1 >> where o1.file_date >= '20090216' and o1.file_date <= '20090217' >> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers >> on (o_buyers.merchandise_id = o.merchandise_id) >> where o.file_date >= '20090216' and o.file_date <= '20090217' >> group by m.description, o_buyers.num limit 40 ; >> >> >> ________________________________ >> From: Matt Pestritto <[EMAIL PROTECTED]<http://[EMAIL PROTECTED]>> >> Reply-To: <[EMAIL PROTECTED]<http://[EMAIL PROTECTED] >> >> >> Date: Wed, 22 Apr 2009 15:43:40 -0700 >> To: <[EMAIL PROTECTED]<http://[EMAIL PROTECTED]>> >> Subject: Re: Aggregrate Query Fails. >> >> >> Thanks. >> I tried that also earlier: >> FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference >> buyers >> >> On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <[EMAIL PROTECTED]< >> http://[EMAIL PROTECTED]>> wrote: >> >> >> in the group by, try this instead: >> >> group by m.description, buyers limit 40 ; >> >> Matt Pestritto wrote: >> Hi - I'm having a problem with a query below. When I try to run any >> aggregate function on a column from the sub-query, the job fails. >> The queries and output messages are below. >> >> Suggestions? >> >> thanks in advance. >> >> -- works: 2 map-reduces jobs. >> select m.description, o_buyers.num as buyers >> from clickstream_output o >> join merchandise m on (o.merchandise_id = m.merchandise_id) >> left outer join ( select o1.merchandise_id, count(distinct
-
Re: Aggregrate Query Fails.Matt Pestritto 2009-04-23, 03:38
Zheng,
Eureka! That did it. I had already changed all of my table definitions so all columns were strings so all I needed to do was cast the aggregate function in the join sub-query and that did it for me. -- CAST(count(distinct o1.consumer_id) AS STRING) as num. This would also explain some of the exceptions I was seeing on other joins. Thanks so much -Matt On Wed, Apr 22, 2009 at 9:35 PM, Zheng Shao <[EMAIL PROTECTED]> wrote: > Hi Matt, > > There is a known bug with JOIN - all output columns from JOIN will become > STRING. > The stacktrace shows this is exactly because of that bug. > > The workaround right now is to replace all columns with CAST(xxx as > STRING), if the column type is not a STRING. > > > select m.description, o_buyers.num as buyers*, count(1) as total* > from clickstream_output o > join merchandise m on (CAST(o.merchandise_id AS STRING) > CAST(m.merchandise_id AS STRING)) > left outer join ( select CAST(o1.merchandise_id AS STRING) as > merchandise_id, CAST(count(distinct o1.consumer_id) AS STRING) as num from > clickstream_output o1 > where o1.file_date >= '20090216' and o1.file_date <= '20090217' > and o1.event_id = 'buy' group by CAST(o1.merchandise_id AS STRING) ) > o_buyers > on (CAST(o_buyers.merchandise_id AS STRING) = CAST(o.merchandise_id AS > STRING)) > where o.file_date >= '20090216' and o.file_date <= '20090217' > *group by m.description, o_buyers.num* limit 40 ; > > > It will be fixed by https://issues.apache.org/jira/browse/HIVE-405. > > Zheng > > > On Wed, Apr 22, 2009 at 5:09 PM, Ashish Thusoo <[EMAIL PROTECTED]>wrote: > >> Can you do an explain <query> and send us the plan. >> >> Another thing that you may try is to put the entire subquery in the from >> clause and then do an aggregation on it.. >> >> i.e. >> >> select ..., count(1) >> from (select ... >> from clickstream_output o join .....) t >> group by t.description, t.num limit 40; >> >> Also are you using the 0.3.0 release candidate? >> >> Ashish >> >> ________________________________________ >> From: Matt Pestritto [[EMAIL PROTECTED]] >> Sent: Wednesday, April 22, 2009 3:53 PM >> To: [EMAIL PROTECTED] >> Subject: Re: Aggregrate Query Fails. >> >> So the only change was to remove the column alias correct? >> Still no luck. Same result. >> >> On Wed, Apr 22, 2009 at 6:49 PM, Prasad Chakka <[EMAIL PROTECTED] >> <mailto:[EMAIL PROTECTED]>> wrote: >> That is strange... Does below also fail? >> >> select m.description, o_buyers.num , count(1) as total >> from clickstream_output o >> join merchandise m on (o.merchandise_id = m.merchandise_id) >> left outer join ( select o1.merchandise_id, count(distinct >> o1.consumer_id) as num from clickstream_output o1 >> where o1.file_date >= '20090216' and o1.file_date <= '20090217' >> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers >> on (o_buyers.merchandise_id = o.merchandise_id) >> where o.file_date >= '20090216' and o.file_date <= '20090217' >> group by m.description, o_buyers.num limit 40 ; >> >> >> ________________________________ >> From: Matt Pestritto <[EMAIL PROTECTED]<http://[EMAIL PROTECTED]>> >> Reply-To: <[EMAIL PROTECTED]<http://[EMAIL PROTECTED] >> >> >> Date: Wed, 22 Apr 2009 15:43:40 -0700 >> To: <[EMAIL PROTECTED]<http://[EMAIL PROTECTED]>> >> Subject: Re: Aggregrate Query Fails. >> >> >> Thanks. >> I tried that also earlier: >> FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference >> buyers >> >> On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <[EMAIL PROTECTED]< >> http://[EMAIL PROTECTED]>> wrote: >> >> >> in the group by, try this instead: >> >> group by m.description, buyers limit 40 ; >> >> Matt Pestritto wrote: >> Hi - I'm having a problem with a query below. When I try to run any >> aggregate function on a column from the sub-query, the job fails. >> The queries and output messages are below. >> >> Suggestions? >> >> thanks in advance. >
-
Re: Aggregrate Query Fails.Zheng Shao 2009-04-23, 08:39
Hi Jeff,
Thanks for the suggestion. That's a good idea. I just added that to http://wiki.apache.org/hadoop/Hive/FAQ Zheng On Wed, Apr 22, 2009 at 8:22 PM, Jeff Hammerbacher <[EMAIL PROTECTED]>wrote: > Hey Zheng, > > Thanks for the insight. Perhaps these sorts of quirks could be added to the > documentation on the wiki? > > Thanks, > Jeff > > > On Wed, Apr 22, 2009 at 6:35 PM, Zheng Shao <[EMAIL PROTECTED]> wrote: > >> Hi Matt, >> >> There is a known bug with JOIN - all output columns from JOIN will become >> STRING. >> The stacktrace shows this is exactly because of that bug. >> >> The workaround right now is to replace all columns with CAST(xxx as >> STRING), if the column type is not a STRING. >> >> >> select m.description, o_buyers.num as buyers*, count(1) as total* >> from clickstream_output o >> join merchandise m on (CAST(o.merchandise_id AS STRING) >> CAST(m.merchandise_id AS STRING)) >> left outer join ( select CAST(o1.merchandise_id AS STRING) as >> merchandise_id, CAST(count(distinct o1.consumer_id) AS STRING) as num from >> clickstream_output o1 >> where o1.file_date >= '20090216' and o1.file_date <= '20090217' >> and o1.event_id = 'buy' group by CAST(o1.merchandise_id AS STRING) ) >> o_buyers >> on (CAST(o_buyers.merchandise_id AS STRING) = CAST(o.merchandise_id >> AS STRING)) >> where o.file_date >= '20090216' and o.file_date <= '20090217' >> *group by m.description, o_buyers.num* limit 40 ; >> >> >> It will be fixed by https://issues.apache.org/jira/browse/HIVE-405. >> >> Zheng >> >> >> On Wed, Apr 22, 2009 at 5:09 PM, Ashish Thusoo <[EMAIL PROTECTED]>wrote: >> >>> Can you do an explain <query> and send us the plan. >>> >>> Another thing that you may try is to put the entire subquery in the from >>> clause and then do an aggregation on it.. >>> >>> i.e. >>> >>> select ..., count(1) >>> from (select ... >>> from clickstream_output o join .....) t >>> group by t.description, t.num limit 40; >>> >>> Also are you using the 0.3.0 release candidate? >>> >>> Ashish >>> >>> ________________________________________ >>> From: Matt Pestritto [[EMAIL PROTECTED]] >>> Sent: Wednesday, April 22, 2009 3:53 PM >>> To: [EMAIL PROTECTED] >>> Subject: Re: Aggregrate Query Fails. >>> >>> So the only change was to remove the column alias correct? >>> Still no luck. Same result. >>> >>> On Wed, Apr 22, 2009 at 6:49 PM, Prasad Chakka <[EMAIL PROTECTED] >>> <mailto:[EMAIL PROTECTED]>> wrote: >>> That is strange... Does below also fail? >>> >>> select m.description, o_buyers.num , count(1) as total >>> from clickstream_output o >>> join merchandise m on (o.merchandise_id = m.merchandise_id) >>> left outer join ( select o1.merchandise_id, count(distinct >>> o1.consumer_id) as num from clickstream_output o1 >>> where o1.file_date >= '20090216' and o1.file_date <= '20090217' >>> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers >>> on (o_buyers.merchandise_id = o.merchandise_id) >>> where o.file_date >= '20090216' and o.file_date <= '20090217' >>> group by m.description, o_buyers.num limit 40 ; >>> >>> >>> ________________________________ >>> From: Matt Pestritto <[EMAIL PROTECTED]<http://[EMAIL PROTECTED]>> >>> Reply-To: <[EMAIL PROTECTED]<http://hive-user@ >>> hadoop.apache.org>> >>> Date: Wed, 22 Apr 2009 15:43:40 -0700 >>> To: <[EMAIL PROTECTED]<http://[EMAIL PROTECTED]>> >>> Subject: Re: Aggregrate Query Fails. >>> >>> >>> Thanks. >>> I tried that also earlier: >>> FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference >>> buyers >>> >>> On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <[EMAIL PROTECTED]< >>> http://[EMAIL PROTECTED]>> wrote: >>> >>> >>> in the group by, try this instead: >>> >>> group by m.description, buyers limit 40 ; >>> >>> Matt Pestritto wrote: >>> Hi - I'm having a problem with a query below. When I try to run any >>> aggregate function on a column from the sub-query, the job fails. >>> The queries and output messages are below. Yours, Zheng |