|
Ryabin, Thomas
2012-04-27, 20:48
Bejoy KS
2012-04-28, 05:43
Ryabin, Thomas
2012-04-30, 14:06
Edward Capriolo
2012-04-30, 14:09
Bejoy KS
2012-04-30, 14:14
Ryabin, Thomas
2012-04-30, 16:53
Edward Capriolo
2012-04-30, 17:07
Ryabin, Thomas
2012-04-30, 17:26
|
-
How to make the query compiler not determine the number of reducers?Ryabin, Thomas 2012-04-27, 20:48
Hi,
When I run a query that uses a custom UDF I made, one of the lines it prints out is: Number of reduce tasks determined at compile time: 1 And this causes the MapReduce job to have only 1 reducer. Is there a way to make it so the compiler does not determine the number of reduce tasks to create, so I can specify the number myself? The query in question is: select test_udf(name, store) from employees join stores; Thanks, Thomas
-
Re: How to make the query compiler not determine the number of reducers?Bejoy KS 2012-04-28, 05:43
Hi Thomas
Hive automatically sets the number of reducers for you. But you can easily override them at CLI. Before executing your query hive>SET mapred.reduce.tasks=n; Where n is the required num of reducers. Regards Bejoy KS Sent from handheld, please excuse typos. -----Original Message----- From: "Ryabin, Thomas" <[EMAIL PROTECTED]> Date: Fri, 27 Apr 2012 16:48:25 To: <[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Subject: How to make the query compiler not determine the number of reducers? Hi, When I run a query that uses a custom UDF I made, one of the lines it prints out is: Number of reduce tasks determined at compile time: 1 And this causes the MapReduce job to have only 1 reducer. Is there a way to make it so the compiler does not determine the number of reduce tasks to create, so I can specify the number myself? The query in question is: select test_udf(name, store) from employees join stores; Thanks, Thomas
-
RE: How to make the query compiler not determine the number of reducers?Ryabin, Thomas 2012-04-30, 14:06
I tried using this to set the number of reduce tasks to 2, but it
doesn't work for me. In my case the Hive query always creates 8 map tasks and 1 reduce task. Could the number of reduce tasks be limited by the number of map tasks, so that if I wanted 2 reduce tasks I would need to increase the number of map tasks to 16 in my case? -Thomas From: Bejoy KS [mailto:[EMAIL PROTECTED]] Sent: Saturday, April 28, 2012 1:43 AM To: [EMAIL PROTECTED] Subject: Re: How to make the query compiler not determine the number of reducers? Hi Thomas Hive automatically sets the number of reducers for you. But you can easily override them at CLI. Before executing your query hive>SET mapred.reduce.tasks=n; Where n is the required num of reducers. Regards Bejoy KS Sent from handheld, please excuse typos. _____ From: "Ryabin, Thomas" <[EMAIL PROTECTED]> Date: Fri, 27 Apr 2012 16:48:25 -0400 To: <[EMAIL PROTECTED]> ReplyTo: [EMAIL PROTECTED] Subject: How to make the query compiler not determine the number of reducers? Hi, When I run a query that uses a custom UDF I made, one of the lines it prints out is: Number of reduce tasks determined at compile time: 1 And this causes the MapReduce job to have only 1 reducer. Is there a way to make it so the compiler does not determine the number of reduce tasks to create, so I can specify the number myself? The query in question is: select test_udf(name, store) from employees join stores; Thanks, Thomas
-
Re: How to make the query compiler not determine the number of reducers?Edward Capriolo 2012-04-30, 14:09
That is the way to do it. Some tasks like order by are forced into a
single reducer so depending on the query you are running you may not be able to control the number. On Mon, Apr 30, 2012 at 10:06 AM, Ryabin, Thomas <[EMAIL PROTECTED]> wrote: > I tried using this to set the number of reduce tasks to 2, but it doesn’t > work for me. In my case the Hive query always creates 8 map tasks and 1 > reduce task. Could the number of reduce tasks be limited by the number of > map tasks, so that if I wanted 2 reduce tasks I would need to increase the > number of map tasks to 16 in my case? > > > > -Thomas > > > > From: Bejoy KS [mailto:[EMAIL PROTECTED]] > Sent: Saturday, April 28, 2012 1:43 AM > To: [EMAIL PROTECTED] > Subject: Re: How to make the query compiler not determine the number of > reducers? > > > > Hi Thomas > Hive automatically sets the number of reducers for you. But you can easily > override them at CLI. Before executing your query > hive>SET mapred.reduce.tasks=n; > > Where n is the required num of reducers. > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > > ________________________________ > > From: "Ryabin, Thomas" <[EMAIL PROTECTED]> > > Date: Fri, 27 Apr 2012 16:48:25 -0400 > > To: <[EMAIL PROTECTED]> > > ReplyTo: [EMAIL PROTECTED] > > Subject: How to make the query compiler not determine the number of > reducers? > > > > Hi, > > > > When I run a query that uses a custom UDF I made, one of the lines it prints > out is: > > Number of reduce tasks determined at compile time: 1 > > > > And this causes the MapReduce job to have only 1 reducer. Is there a way to > make it so the compiler does not determine the number of reduce tasks to > create, so I can specify the number myself? > > > > The query in question is: > > select test_udf(name, store) from employees join stores; > > > > Thanks, > > Thomas
-
Re: How to make the query compiler not determine the number of reducers?Bejoy KS 2012-04-30, 14:14
Thomas,
It needn't be the case, raising your map tasks may not have any effect on reduce tasks. May be we can help you out if you could provide some details like : - the query you are executing - describe formatted on the tables involved in query Regards Bejoy KS Sent from handheld, please excuse typos. -----Original Message----- From: "Ryabin, Thomas" <[EMAIL PROTECTED]> Date: Mon, 30 Apr 2012 10:06:01 To: <[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Subject: RE: How to make the query compiler not determine the number of reducers? I tried using this to set the number of reduce tasks to 2, but it doesn't work for me. In my case the Hive query always creates 8 map tasks and 1 reduce task. Could the number of reduce tasks be limited by the number of map tasks, so that if I wanted 2 reduce tasks I would need to increase the number of map tasks to 16 in my case? -Thomas From: Bejoy KS [mailto:[EMAIL PROTECTED]] Sent: Saturday, April 28, 2012 1:43 AM To: [EMAIL PROTECTED] Subject: Re: How to make the query compiler not determine the number of reducers? Hi Thomas Hive automatically sets the number of reducers for you. But you can easily override them at CLI. Before executing your query hive>SET mapred.reduce.tasks=n; Where n is the required num of reducers. Regards Bejoy KS Sent from handheld, please excuse typos. _____ From: "Ryabin, Thomas" <[EMAIL PROTECTED]> Date: Fri, 27 Apr 2012 16:48:25 -0400 To: <[EMAIL PROTECTED]> ReplyTo: [EMAIL PROTECTED] Subject: How to make the query compiler not determine the number of reducers? Hi, When I run a query that uses a custom UDF I made, one of the lines it prints out is: Number of reduce tasks determined at compile time: 1 And this causes the MapReduce job to have only 1 reducer. Is there a way to make it so the compiler does not determine the number of reduce tasks to create, so I can specify the number myself? The query in question is: select test_udf(name, store) from employees join stores; Thanks, Thomas
-
RE: How to make the query compiler not determine the number of reducers?Ryabin, Thomas 2012-04-30, 16:53
The query I am executing is:
select test_udf(name, store) from employees join stores; My goal for this query is to run every combination of employees.name and stores.store through my test_udf, and have Hadoop spread the computation among the reducers. So if I have 5 rows in the "stores" table and 3 rows in the "employees" table then there would be 15 combinations, and if I had 3 reducers then ideally each reducer would get 5 combinations. I created the tables with these commands: create external table employees(row_key string, name string) stored by 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' with serdeproperties ("cassandra.columns.mapping" = ":key,name", "cassandra.ks.name" = "test", "cassandra.cf.name" = "employees"); create external table stores(row_key string, store string) stored by 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' with serdeproperties ("cassandra.columns.mapping" = ":key,store", "cassandra.ks.name" = "test", "cassandra.cf.name" = "stores"); I am using Cassandra as the storage mechanism. I have tried using the ON operator with my query like so: select test_udf(name, store) from employees join stores on (employees.name = stores.store); and in this case Hive creates 3 reduce tasks, but nothing gets done because there are no matching keys. Is there a way to accomplish what I am trying to do by using "distribute by", "cluster by", and/or bucketed tables, or something else? Thanks, Thomas From: Bejoy KS [mailto:[EMAIL PROTECTED]] Sent: Monday, April 30, 2012 10:15 AM To: [EMAIL PROTECTED] Subject: Re: How to make the query compiler not determine the number of reducers? Thomas, It needn't be the case, raising your map tasks may not have any effect on reduce tasks. May be we can help you out if you could provide some details like : - the query you are executing - describe formatted on the tables involved in query Regards Bejoy KS Sent from handheld, please excuse typos. _____ From: "Ryabin, Thomas" <[EMAIL PROTECTED]> Date: Mon, 30 Apr 2012 10:06:01 -0400 To: <[EMAIL PROTECTED]> ReplyTo: [EMAIL PROTECTED] Subject: RE: How to make the query compiler not determine the number of reducers? I tried using this to set the number of reduce tasks to 2, but it doesn't work for me. In my case the Hive query always creates 8 map tasks and 1 reduce task. Could the number of reduce tasks be limited by the number of map tasks, so that if I wanted 2 reduce tasks I would need to increase the number of map tasks to 16 in my case? -Thomas From: Bejoy KS [mailto:[EMAIL PROTECTED]] Sent: Saturday, April 28, 2012 1:43 AM To: [EMAIL PROTECTED] Subject: Re: How to make the query compiler not determine the number of reducers? Hi Thomas Hive automatically sets the number of reducers for you. But you can easily override them at CLI. Before executing your query hive>SET mapred.reduce.tasks=n; Where n is the required num of reducers. Regards Bejoy KS Sent from handheld, please excuse typos. _____ From: "Ryabin, Thomas" <[EMAIL PROTECTED]> Date: Fri, 27 Apr 2012 16:48:25 -0400 To: <[EMAIL PROTECTED]> ReplyTo: [EMAIL PROTECTED] Subject: How to make the query compiler not determine the number of reducers? Hi, When I run a query that uses a custom UDF I made, one of the lines it prints out is: Number of reduce tasks determined at compile time: 1 And this causes the MapReduce job to have only 1 reducer. Is there a way to make it so the compiler does not determine the number of reduce tasks to create, so I can specify the number myself? The query in question is: select test_udf(name, store) from employees join stores; Thanks, Thomas
-
Re: How to make the query compiler not determine the number of reducers?Edward Capriolo 2012-04-30, 17:07
You are trying to create a Cartesian product.
select * FROM table1,table2 should do that. You do not need a join clause. On Mon, Apr 30, 2012 at 12:53 PM, Ryabin, Thomas <[EMAIL PROTECTED]> wrote: > The query I am executing is: > > select test_udf(name, store) from employees join stores; > > > > My goal for this query is to run every combination of employees.name and > stores.store through my test_udf, and have Hadoop spread the computation > among the reducers. So if I have 5 rows in the “stores” table and 3 rows in > the “employees” table then there would be 15 combinations, and if I had 3 > reducers then ideally each reducer would get 5 combinations. > > > > I created the tables with these commands: > > create external table employees(row_key string, name string) > > stored by 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' > > with serdeproperties ("cassandra.columns.mapping" = ":key,name", > > “cassandra.ks.name” = “test”, > > “cassandra.cf.name” = “employees”); > > > > create external table stores(row_key string, store string) > > stored by 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' > > with serdeproperties ("cassandra.columns.mapping" = ":key,store", > > “cassandra.ks.name” = “test”, > > “cassandra.cf.name” = “stores”); > > > > I am using Cassandra as the storage mechanism. I have tried using the ON > operator with my query like so: > > select test_udf(name, store) from employees join stores on (employees.name > stores.store); > > > > and in this case Hive creates 3 reduce tasks, but nothing gets done because > there are no matching keys. Is there a way to accomplish what I am trying to > do by using “distribute by”, “cluster by”, and/or bucketed tables, or > something else? > > > > Thanks, > > Thomas > > > > > > From: Bejoy KS [mailto:[EMAIL PROTECTED]] > Sent: Monday, April 30, 2012 10:15 AM > To: [EMAIL PROTECTED] > Subject: Re: How to make the query compiler not determine the number of > reducers? > > > > Thomas, > > It needn't be the case, raising your map tasks may not have any effect on > reduce tasks. May be we can help you out if you could provide some details > like : > - the query you are executing > - describe formatted on the tables involved in query > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > > ________________________________ > > From: "Ryabin, Thomas" <[EMAIL PROTECTED]> > > Date: Mon, 30 Apr 2012 10:06:01 -0400 > > To: <[EMAIL PROTECTED]> > > ReplyTo: [EMAIL PROTECTED] > > Subject: RE: How to make the query compiler not determine the number of > reducers? > > > > I tried using this to set the number of reduce tasks to 2, but it doesn’t > work for me. In my case the Hive query always creates 8 map tasks and 1 > reduce task. Could the number of reduce tasks be limited by the number of > map tasks, so that if I wanted 2 reduce tasks I would need to increase the > number of map tasks to 16 in my case? > > > > -Thomas > > > > From: Bejoy KS [mailto:[EMAIL PROTECTED]] > Sent: Saturday, April 28, 2012 1:43 AM > To: [EMAIL PROTECTED] > Subject: Re: How to make the query compiler not determine the number of > reducers? > > > > Hi Thomas > Hive automatically sets the number of reducers for you. But you can easily > override them at CLI. Before executing your query > hive>SET mapred.reduce.tasks=n; > > Where n is the required num of reducers. > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > > ________________________________ > > From: "Ryabin, Thomas" <[EMAIL PROTECTED]> > > Date: Fri, 27 Apr 2012 16:48:25 -0400 > > To: <[EMAIL PROTECTED]> > > ReplyTo: [EMAIL PROTECTED] > > Subject: How to make the query compiler not determine the number of > reducers? > > > > Hi, > > > > When I run a query that uses a custom UDF I made, one of the lines it prints > out is: > > Number of reduce tasks determined at compile time: 1 > > > > And this causes the MapReduce job to have only 1 reducer. Is there a way to > make it so the compiler does not determine the number of reduce tasks to
-
RE: How to make the query compiler not determine the number of reducers?Ryabin, Thomas 2012-04-30, 17:26
Edward,
That is not working for me, I get a syntax error: hive> select * from employees,stores; FAILED: Parse Error: line 1:18 mismatched input ',' expecting EOF near 'employees' -Thomas -----Original Message----- From: Edward Capriolo [mailto:[EMAIL PROTECTED]] Sent: Monday, April 30, 2012 1:08 PM To: [EMAIL PROTECTED] Subject: Re: How to make the query compiler not determine the number of reducers? You are trying to create a Cartesian product. select * FROM table1,table2 should do that. You do not need a join clause. On Mon, Apr 30, 2012 at 12:53 PM, Ryabin, Thomas <[EMAIL PROTECTED]> wrote: > The query I am executing is: > > select test_udf(name, store) from employees join stores; > > > > My goal for this query is to run every combination of employees.name and > stores.store through my test_udf, and have Hadoop spread the computation > among the reducers. So if I have 5 rows in the "stores" table and 3 rows in > the "employees" table then there would be 15 combinations, and if I had 3 > reducers then ideally each reducer would get 5 combinations. > > > > I created the tables with these commands: > > create external table employees(row_key string, name string) > > stored by 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' > > with serdeproperties ("cassandra.columns.mapping" = ":key,name", > > "cassandra.ks.name" = "test", > > "cassandra.cf.name" = "employees"); > > > > create external table stores(row_key string, store string) > > stored by 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' > > with serdeproperties ("cassandra.columns.mapping" = ":key,store", > > "cassandra.ks.name" = "test", > > "cassandra.cf.name" = "stores"); > > > > I am using Cassandra as the storage mechanism. I have tried using the ON > operator with my query like so: > > select test_udf(name, store) from employees join stores on (employees.name > stores.store); > > > > and in this case Hive creates 3 reduce tasks, but nothing gets done because > there are no matching keys. Is there a way to accomplish what I am trying to > do by using "distribute by", "cluster by", and/or bucketed tables, or > something else? > > > > Thanks, > > Thomas > > > > > > From: Bejoy KS [mailto:[EMAIL PROTECTED]] > Sent: Monday, April 30, 2012 10:15 AM > To: [EMAIL PROTECTED] > Subject: Re: How to make the query compiler not determine the number of > reducers? > > > > Thomas, > > It needn't be the case, raising your map tasks may not have any effect on > reduce tasks. May be we can help you out if you could provide some details > like : > - the query you are executing > - describe formatted on the tables involved in query > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > > ________________________________ > > From: "Ryabin, Thomas" <[EMAIL PROTECTED]> > > Date: Mon, 30 Apr 2012 10:06:01 -0400 > > To: <[EMAIL PROTECTED]> > > ReplyTo: [EMAIL PROTECTED] > > Subject: RE: How to make the query compiler not determine the number of > reducers? > > > > I tried using this to set the number of reduce tasks to 2, but it doesn't > work for me. In my case the Hive query always creates 8 map tasks and 1 > reduce task. Could the number of reduce tasks be limited by the number of > map tasks, so that if I wanted 2 reduce tasks I would need to increase the > number of map tasks to 16 in my case? > > > > -Thomas > > > > From: Bejoy KS [mailto:[EMAIL PROTECTED]] > Sent: Saturday, April 28, 2012 1:43 AM > To: [EMAIL PROTECTED] > Subject: Re: How to make the query compiler not determine the number of > reducers? > > > > Hi Thomas > Hive automatically sets the number of reducers for you. But you can easily > override them at CLI. Before executing your query > hive>SET mapred.reduce.tasks=n; > > Where n is the required num of reducers. > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > > ________________________________ > > From: "Ryabin, Thomas" <[EMAIL PROTECTED]> > > Date: Fri, 27 Apr 2012 16:48:25 -0400 prints way to to |