|
|
-
Distributed execution for UNION ALL
Alexander Goryunov 2012-05-04, 12:52
Hello,
I have a query like
SELECT * FROM ( SELECT 1, concat(1_timestamp, ', ', 2_account_id ) FROM table_1 WHERE 2_account_id = 1132576 LIMIT 1000000000 UNION ALL SELECT 2, concat(1_timestamp, ', ', 2_account_id ) FROM table_2 WHERE 2_account_id = 1132576 LIMIT 1000000000 UNION ALL SELECT 3, concat(1_timestamp, ', ', 2_account_id ) FROM table_3 WHERE 2_account_id = 1132576 LIMIT 1000000000 UNION ALL .... // some hundred tables here ) res;
Parallel jobs set to true in hive config and it creates mapreduce max map tasks on the requested node.
What should be done to distribute that jobs over the all cluster nodes?
Thanks.
-
Re: Distributed execution for UNION ALL
Bejoy Ks 2012-05-04, 13:07
Hi Alexander Since the tasks are just executing on local node. Looks like hive map reduce jobs are running locally. What is the value for mapred.job.tracker in your job.xml or from mapred-site.xml?
Regards Bejoy KS ________________________________ From: Alexander Goryunov <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Friday, May 4, 2012 6:22 PM Subject: Distributed execution for UNION ALL
Hello,
I have a query like
SELECT * FROM ( SELECT 1, concat(1_timestamp, ', ', 2_account_id ) FROM table_1 WHERE 2_account_id = 1132576 LIMIT 1000000000 UNION ALL SELECT 2, concat(1_timestamp, ', ', 2_account_id ) FROM table_2 WHERE 2_account_id = 1132576 LIMIT 1000000000 UNION ALL SELECT 3, concat(1_timestamp, ', ', 2_account_id ) FROM table_3 WHERE 2_account_id = 1132576 LIMIT 1000000000 UNION ALL .... // some hundred tables here ) res;
Parallel jobs set to true in hive config and it creates mapreduce max map tasks on the requested node.
What should be done to distribute that jobs over the all cluster nodes?
Thanks.
-
Re: Distributed execution for UNION ALL
Alexander Goryunov 2012-05-04, 13:23
Hi Bejoy KS,
Thanks for your answer.
from job.xml: *mapred.job.tracker* =full.namenode.hostname:8021 On Fri, May 4, 2012 at 5:07 PM, Bejoy Ks <[EMAIL PROTECTED]> wrote:
> Hi Alexander > Since the tasks are just executing on local node. Looks like hive > map reduce jobs are running locally. What is the value for * > mapred.job.tracker *in your job.xml or from mapred-site.xml? > > Regards > Bejoy KS > > ------------------------------ > *From:* Alexander Goryunov <[EMAIL PROTECTED]> > *To:* [EMAIL PROTECTED] > *Sent:* Friday, May 4, 2012 6:22 PM > *Subject:* Distributed execution for UNION ALL > > Hello, > > I have a query like > > SELECT * FROM ( > SELECT 1, concat(1_timestamp, ', ', 2_account_id ) > FROM table_1 WHERE 2_account_id = 1132576 LIMIT 1000000000 > UNION ALL > SELECT 2, concat(1_timestamp, ', ', 2_account_id ) > FROM table_2 WHERE 2_account_id = 1132576 LIMIT 1000000000 > UNION ALL > SELECT 3, concat(1_timestamp, ', ', 2_account_id ) > FROM table_3 WHERE 2_account_id = 1132576 LIMIT 1000000000 > UNION ALL > .... // some hundred tables here > ) res; > > Parallel jobs set to true in hive config and it creates mapreduce max map > tasks on the requested node. > > What should be done to distribute that jobs over the all cluster nodes? > > Thanks. > > >
-
Re: Distributed execution for UNION ALL
Bejoy KS 2012-05-04, 13:58
Hi Alexander Are you have a single node execution issue only for this particular query that involves Union all or is it same across all hive queries.
Regards Bejoy KS
Sent from handheld, please excuse typos.
-----Original Message----- From: Alexander Goryunov <[EMAIL PROTECTED]> Date: Fri, 4 May 2012 17:23:24 To: <[EMAIL PROTECTED]>; Bejoy Ks<[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Subject: Re: Distributed execution for UNION ALL
Hi Bejoy KS,
Thanks for your answer.
from job.xml: *mapred.job.tracker* =full.namenode.hostname:8021 On Fri, May 4, 2012 at 5:07 PM, Bejoy Ks <[EMAIL PROTECTED]> wrote:
> Hi Alexander > Since the tasks are just executing on local node. Looks like hive > map reduce jobs are running locally. What is the value for * > mapred.job.tracker *in your job.xml or from mapred-site.xml? > > Regards > Bejoy KS > > ------------------------------ > *From:* Alexander Goryunov <[EMAIL PROTECTED]> > *To:* [EMAIL PROTECTED] > *Sent:* Friday, May 4, 2012 6:22 PM > *Subject:* Distributed execution for UNION ALL > > Hello, > > I have a query like > > SELECT * FROM ( > SELECT 1, concat(1_timestamp, ', ', 2_account_id ) > FROM table_1 WHERE 2_account_id = 1132576 LIMIT 1000000000 > UNION ALL > SELECT 2, concat(1_timestamp, ', ', 2_account_id ) > FROM table_2 WHERE 2_account_id = 1132576 LIMIT 1000000000 > UNION ALL > SELECT 3, concat(1_timestamp, ', ', 2_account_id ) > FROM table_3 WHERE 2_account_id = 1132576 LIMIT 1000000000 > UNION ALL > .... // some hundred tables here > ) res; > > Parallel jobs set to true in hive config and it creates mapreduce max map > tasks on the requested node. > > What should be done to distribute that jobs over the all cluster nodes? > > Thanks. > > >
-
Re: Distributed execution for UNION ALL
Alexander Goryunov 2012-05-04, 14:05
I have this issue for all hive queries.
In fact, I've tried only two types of queries (with UNION ALL) and simple queries like SELECT field1, field2 FROM table SOME_TABLE WHERE field2=SOME_CONST;
Thanks.
On Fri, May 4, 2012 at 5:58 PM, Bejoy KS <[EMAIL PROTECTED]> wrote:
> ** > Hi Alexander > Are you have a single node execution issue only for this particular query > that involves Union all or is it same across all hive queries. > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > ------------------------------ > *From: * Alexander Goryunov <[EMAIL PROTECTED]> > *Date: *Fri, 4 May 2012 17:23:24 +0400 > *To: *<[EMAIL PROTECTED]>; Bejoy Ks<[EMAIL PROTECTED]> > *ReplyTo: * [EMAIL PROTECTED] > *Subject: *Re: Distributed execution for UNION ALL > > Hi Bejoy KS, > > Thanks for your answer. > > from job.xml: > *mapred.job.tracker* =full.namenode.hostname:8021 > > > On Fri, May 4, 2012 at 5:07 PM, Bejoy Ks <[EMAIL PROTECTED]> wrote: > >> Hi Alexander >> Since the tasks are just executing on local node. Looks like hive >> map reduce jobs are running locally. What is the value for * >> mapred.job.tracker *in your job.xml or from mapred-site.xml? >> >> Regards >> Bejoy KS >> >> ------------------------------ >> *From:* Alexander Goryunov <[EMAIL PROTECTED]> >> *To:* [EMAIL PROTECTED] >> *Sent:* Friday, May 4, 2012 6:22 PM >> *Subject:* Distributed execution for UNION ALL >> >> Hello, >> >> I have a query like >> >> SELECT * FROM ( >> SELECT 1, concat(1_timestamp, ', ', 2_account_id ) >> FROM table_1 WHERE 2_account_id = 1132576 LIMIT 1000000000 >> UNION ALL >> SELECT 2, concat(1_timestamp, ', ', 2_account_id ) >> FROM table_2 WHERE 2_account_id = 1132576 LIMIT 1000000000 >> UNION ALL >> SELECT 3, concat(1_timestamp, ', ', 2_account_id ) >> FROM table_3 WHERE 2_account_id = 1132576 LIMIT 1000000000 >> UNION ALL >> .... // some hundred tables here >> ) res; >> >> Parallel jobs set to true in hive config and it creates mapreduce max map >> tasks on the requested node. >> >> What should be done to distribute that jobs over the all cluster nodes? >> >> Thanks. >> >> >> >
|
|