|
Daniel,Wu
2011-08-23, 13:51
Vikas Srivastava
2011-08-23, 13:58
Daniel,Wu
2011-08-23, 14:16
Aggarwal, Vaibhav
2011-08-23, 18:28
Daniel,Wu
2011-08-24, 06:43
wd
2011-08-24, 10:19
Daniel,Wu
2011-08-24, 13:39
Ashutosh Chauhan
2011-08-24, 22:35
Steven Wong
2011-08-24, 23:01
Daniel,Wu
2011-08-25, 11:38
Daniel,Wu
2011-08-25, 12:02
bejoy_ks@...
2011-08-25, 14:51
|
-
Why a sql only use one map task?Daniel,Wu 2011-08-23, 13:51
I run the following simple sql
select count(*) from sales; And the job information shows it only uses one map task. The underlying hadoop has 3 data/data nodes. So I expect hive should kick off 3 map tasks, one on each task nodes. What can make hive only run one map task? Do I need to set something to kick off multiple map task? in my config, I didn't change hive config.
-
Re: Why a sql only use one map task?Vikas Srivastava 2011-08-23, 13:58
hey did u storing data in zipped format
if yes becoz of that its only split in single map. 2011/8/23 Daniel,Wu <[EMAIL PROTECTED]> > I run the following simple sql > select count(*) from sales; > And the job information shows it only uses one map task. > > The underlying hadoop has 3 data/data nodes. So I expect hive should kick > off 3 map tasks, one on each task nodes. What can make hive only run one map > task? Do I need to set something to kick off multiple map task? in my > config, I didn't change hive config. > > > -- With Regards Vikas Srivastava DWH & Analytics Team Mob:+91 9560885900 One97 | Let's get talking !
-
Re:Re: Why a sql only use one map task?Daniel,Wu 2011-08-23, 14:16
No, I didn't use zip, it's just simple csv file, and then use the command
load data local inpath '/home/oracle/sales.csv' into table test; to load into hive. I am not sure whether this command alone can distribute the file evenly into the cluster (on 3 nodes). So I used the following command in the hope to split the file into cluster. create table sales as select * from test; But when I check the map tasks, it shows I have 8 splits, but all are on node test1. If I run the sql select period_key,count(*) from sales group by period_key, then it will kick of ONE map task, and 3 reduce tasks. So looks like it always uses one map tasks. I have 2 questions: 1: why hadoop doesn't distribute the input split evenly on to each node, shouldn't we put 3 split on 2 nodes, and then 2 splits on one node ( 3*2 +2=8 splits)? 2: how to create multiple map tasks? Input Split Locations /default-rack/test1 /default-rack/test1 /default-rack/test1 /default-rack/test1 /default-rack/test1 /default-rack/test1 /default-rack/test1 /default-rack/test1 At 2011-08-23 21:58:04,"Vikas Srivastava" <[EMAIL PROTECTED]> wrote: hey did u storing data in zipped format if yes becoz of that its only split in single map. 2011/8/23 Daniel,Wu<[EMAIL PROTECTED]> I run the following simple sql select count(*) from sales; And the job information shows it only uses one map task. The underlying hadoop has 3 data/data nodes. So I expect hive should kick off 3 map tasks, one on each task nodes. What can make hive only run one map task? Do I need to set something to kick off multiple map task? in my config, I didn't change hive config. -- With Regards Vikas Srivastava DWH & Analytics Team Mob:+91 9560885900 One97 | Let's get talking !
-
RE: Why a sql only use one map task?Aggarwal, Vaibhav 2011-08-23, 18:28
If you actually have splittable files you can set the following setting to create more splits:
mapred.max.split.size appropriately. Thanks Vaibhav From: Daniel,Wu [mailto:[EMAIL PROTECTED]] Sent: Tuesday, August 23, 2011 6:51 AM To: hive Subject: Why a sql only use one map task? I run the following simple sql select count(*) from sales; And the job information shows it only uses one map task. The underlying hadoop has 3 data/data nodes. So I expect hive should kick off 3 map tasks, one on each task nodes. What can make hive only run one map task? Do I need to set something to kick off multiple map task? in my config, I didn't change hive config.
-
Re:RE: Why a sql only use one map task?Daniel,Wu 2011-08-24, 06:43
I checked my setting, all are with the default value.So per the book of "Hadoop the definitive guide", the split size should be 64M. And the file size is about 500M, so that's about 8 splits. And from the map job information (after the map job is done), I can see it gets 8 split from one node. But anyhow it starts only one map task.
At 2011-08-24 02:28:18,"Aggarwal, Vaibhav" <[EMAIL PROTECTED]> wrote: If you actually have splittable files you can set the following setting to create more splits: mapred.max.split.size appropriately. Thanks Vaibhav From: Daniel,Wu [mailto:[EMAIL PROTECTED]] Sent: Tuesday, August 23, 2011 6:51 AM To: hive Subject: Why a sql only use one map task? I run the following simple sql select count(*) from sales; And the job information shows it only uses one map task. The underlying hadoop has 3 data/data nodes. So I expect hive should kick off 3 map tasks, one on each task nodes. What can make hive only run one map task? Do I need to set something to kick off multiple map task? in my config, I didn't change hive config.
-
Re: RE: Why a sql only use one map task?wd 2011-08-24, 10:19
What about your total Map Task Capacity?
you may check it from http://your_jobtracker:50030/jobtracker.jsp 2011/8/24 Daniel,Wu <[EMAIL PROTECTED]>: > I checked my setting, all are with the default value.So per the book of > "Hadoop the definitive guide", the split size should be 64M. And the file > size is about 500M, so that's about 8 splits. And from the map job > information (after the map job is done), I can see it gets 8 split from one > node. But anyhow it starts only one map task. > > > > At 2011-08-24 02:28:18,"Aggarwal, Vaibhav" <[EMAIL PROTECTED]> wrote: > > If you actually have splittable files you can set the following setting to > create more splits: > > > > mapred.max.split.size appropriately. > > > > Thanks > > Vaibhav > > > > From: Daniel,Wu [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, August 23, 2011 6:51 AM > To: hive > Subject: Why a sql only use one map task? > > > > I run the following simple sql > select count(*) from sales; > And the job information shows it only uses one map task. > > The underlying hadoop has 3 data/data nodes. So I expect hive should kick > off 3 map tasks, one on each task nodes. What can make hive only run one map > task? Do I need to set something to kick off multiple map task? in my > config, I didn't change hive config. > > > >
-
Re:Re: RE: Why a sql only use one map task?Daniel,Wu 2011-08-24, 13:39
I pasted the inform I pasted blow, the map capacity is 6. And no matter how I set mapred.map.tasks, such as 3, it doesn't work, as it always use 1 map task (please see the completed job information).
Cluster Summary (Heap Size is 16.81 MB/966.69 MB) Running Map TasksRunning Reduce TasksTotal SubmissionsNodesOccupied Map SlotsOccupied Reduce SlotsReserved Map SlotsReserved Reduce SlotsMap Task CapacityReduce Task CapacityAvg. Tasks/NodeBlacklisted NodesExcluded Nodes 00630000664.0000 Completed Jobs JobidPriorityUserNameMap % CompleteMap TotalMaps CompletedReduce % CompleteReduce TotalReduces CompletedJob Scheduling InformationDiagnostic Info job_201108242119_0001NORMALoracleselect count(*) from test(Stage-1)100.00% 00100.00% 1 1NANA job_201108242119_0002NORMALoracleselect count(*) from test(Stage-1)100.00% 11100.00% 1 1NANA job_201108242119_0003NORMALoracleselect count(*) from test(Stage-1)100.00% 11100.00% 1 1NANA job_201108242119_0004NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00% 11100.00% 3 3NANA job_201108242119_0005NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00% 11100.00% 3 3NANA job_201108242119_0006NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00% 11100.00% 3 3NANA At 2011-08-24 18:19:38,wd <[EMAIL PROTECTED]> wrote: >What about your total Map Task Capacity? >you may check it from http://your_jobtracker:50030/jobtracker.jsp > >2011/8/24 Daniel,Wu <[EMAIL PROTECTED]>: >> I checked my setting, all are with the default value.So per the book of >> "Hadoop the definitive guide", the split size should be 64M. And the file >> size is about 500M, so that's about 8 splits. And from the map job >> information (after the map job is done), I can see it gets 8 split from one >> node. But anyhow it starts only one map task. >> >> >> >> At 2011-08-24 02:28:18,"Aggarwal, Vaibhav" <[EMAIL PROTECTED]> wrote: >> >> If you actually have splittable files you can set the following setting to >> create more splits: >> >> >> >> mapred.max.split.size appropriately. >> >> >> >> Thanks >> >> Vaibhav >> >> >> >> From: Daniel,Wu [mailto:[EMAIL PROTECTED]] >> Sent: Tuesday, August 23, 2011 6:51 AM >> To: hive >> Subject: Why a sql only use one map task? >> >> >> >> I run the following simple sql >> select count(*) from sales; >> And the job information shows it only uses one map task. >> >> The underlying hadoop has 3 data/data nodes. So I expect hive should kick >> off 3 map tasks, one on each task nodes. What can make hive only run one map >> task? Do I need to set something to kick off multiple map task? in my >> config, I didn't change hive config. >> >> >> >>
-
Re: Re: RE: Why a sql only use one map task?Ashutosh Chauhan 2011-08-24, 22:35
This may be because CombineHiveInputFormat is combining your splits in one
map task. If you don't want that to happen, do: hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat 2011/8/24 Daniel,Wu <[EMAIL PROTECTED]> > I pasted the inform I pasted blow, the map capacity is 6. And no matter how > I set mapred.map.tasks, such as 3, it doesn't work, as it always use 1 map > task (please see the completed job information). > > Cluster Summary (Heap Size is 16.81 MB/966.69 MB) Running Map TasksRunning > Reduce TasksTotal SubmissionsNodesOccupied Map SlotsOccupied Reduce SlotsReserved > Map SlotsReserved Reduce Slots Map Task CapacityReduce Task CapacityAvg. > Tasks/NodeBlacklisted NodesExcluded Nodes 0063<http://test1:50030/machines.jsp?type=active> > 0000664.000 <http://test1:50030/machines.jsp?type=blacklisted> 0<http://test1:50030/machines.jsp?type=excluded> > ------------------------------ > Completed Jobs *Jobid**Priority**User**Name**Map % Complete**Map Total**Maps > Completed**Reduce % Complete**Reduce Total* *Reduces Completed**Job > Scheduling Information**Diagnostic Info * job_201108242119_0001<http://test1:50030/jobdetails.jsp?jobid=job_201108242119_0001&refresh=0> > NORMALoracleselect count(*) from test(Stage-1) 100.00% > 00100.00% > 1 1NANA job_201108242119_0002<http://test1:50030/jobdetails.jsp?jobid=job_201108242119_0002&refresh=0> > NORMALoracleselect count(*) from test(Stage-1)100.00% > 11100.00% > 1 1NANA job_201108242119_0003<http://test1:50030/jobdetails.jsp?jobid=job_201108242119_0003&refresh=0> > NORMALoracleselect count(*) from test(Stage-1)100.00% > 11100.00% > 1 1NANA job_201108242119_0004<http://test1:50030/jobdetails.jsp?jobid=job_201108242119_0004&refresh=0> > NORMALoracleselect period_key,count(*) from...period_key(Stage-1) 100.00% > 11100.00% > 3 3NANA job_201108242119_0005<http://test1:50030/jobdetails.jsp?jobid=job_201108242119_0005&refresh=0> > NORMALoracleselect period_key,count(*) from...period_key(Stage-1) 100.00% > 11100.00% > 3 3NANA job_201108242119_0006<http://test1:50030/jobdetails.jsp?jobid=job_201108242119_0006&refresh=0> > NORMALoracleselect period_key,count(*) from...period_key(Stage-1) 100.00% > 11100.00% > 3 3NANA > ------------------------------ > > > At 2011-08-24 18:19:38,wd <[EMAIL PROTECTED]> wrote: > >What about your total Map Task Capacity? > >you may check it from http://your_jobtracker:50030/jobtracker.jsp > > > > >2011/8/24 Daniel,Wu <[EMAIL PROTECTED]>: > >> I checked my setting, all are with the default value.So per the book of > >> "Hadoop the definitive guide", the split size should be 64M. And the file > >> size is about 500M, so that's about 8 splits. And from the map job > >> information (after the map job is done), I can see it gets 8 split from one > >> node. But anyhow it starts only one map task. > >> > >> > >> > >> At 2011-08-24 02:28:18,"Aggarwal, Vaibhav" <[EMAIL PROTECTED]> wrote: > >> > >> If you actually have splittable files you can set the following setting to > >> create more splits: > >> > >> > >> > >> mapred.max.split.size appropriately. > >> > >> > >> > >> Thanks > >> > >> Vaibhav > >> > >> > >> > >> From: Daniel,Wu [mailto:[EMAIL PROTECTED]] > >> Sent: Tuesday, August 23, 2011 6:51 AM > >> To: hive > >> Subject: Why a sql only use one map task? > >> > >> > >> > >> I run the following simple sql > >> select count(*) from sales; > >> And the job information shows it only uses one map task. > >> > >> The underlying hadoop has 3 data/data nodes. So I expect hive should kick > >> off 3 map tasks, one on each task nodes. What can make hive only run one map > >> task? Do I need to set something to kick off multiple map task? in my > >> config, I didn't change hive config. > >> > >> > >> > >> > > > >
-
RE: Re:RE: Why a sql only use one map task?Steven Wong 2011-08-24, 23:01
I think mapred.max.split.size is not set by default. The max split size is not the same as the HDFS block size.
From: Daniel,Wu [mailto:[EMAIL PROTECTED]] Sent: Tuesday, August 23, 2011 11:44 PM To: [EMAIL PROTECTED] Subject: Re:RE: Why a sql only use one map task? I checked my setting, all are with the default value.So per the book of "Hadoop the definitive guide", the split size should be 64M. And the file size is about 500M, so that's about 8 splits. And from the map job information (after the map job is done), I can see it gets 8 split from one node. But anyhow it starts only one map task. At 2011-08-24 02:28:18,"Aggarwal, Vaibhav" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: If you actually have splittable files you can set the following setting to create more splits: mapred.max.split.size appropriately. Thanks Vaibhav From: Daniel,Wu [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>] Sent: Tuesday, August 23, 2011 6:51 AM To: hive Subject: Why a sql only use one map task? I run the following simple sql select count(*) from sales; And the job information shows it only uses one map task. The underlying hadoop has 3 data/data nodes. So I expect hive should kick off 3 map tasks, one on each task nodes. What can make hive only run one map task? Do I need to set something to kick off multiple map task? in my config, I didn't change hive config.
-
Re:Re: Re: RE: Why a sql only use one map task?Daniel,Wu 2011-08-25, 11:38
It works, after I set as you said, but looks like I can't control the map task, it always use 9 maps, even if I set
set mapred.map.tasks=2; Kind% CompleteNum TasksPendingRunningCompleteKilledFailed/Killed Task Attempts map100.00% 900900 / 0 reduce100.00% 100100 / 0 At 2011-08-25 06:35:38,"Ashutosh Chauhan" <[EMAIL PROTECTED]> wrote: This may be because CombineHiveInputFormat is combining your splits in one map task. If you don't want that to happen, do: hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat 2011/8/24 Daniel,Wu<[EMAIL PROTECTED]> I pasted the inform I pasted blow, the map capacity is 6. And no matter how I set mapred.map.tasks, such as 3, it doesn't work, as it always use 1 map task (please see the completed job information). Cluster Summary (Heap Size is 16.81 MB/966.69 MB) Running Map TasksRunning Reduce TasksTotal SubmissionsNodesOccupied Map SlotsOccupied Reduce SlotsReserved Map SlotsReserved Reduce SlotsMap Task CapacityReduce Task CapacityAvg. Tasks/NodeBlacklisted NodesExcluded Nodes 00630000664.0000 Completed Jobs JobidPriorityUserNameMap % CompleteMap TotalMaps CompletedReduce % CompleteReduce TotalReduces CompletedJob Scheduling InformationDiagnostic Info job_201108242119_0001NORMALoracleselect count(*) from test(Stage-1)100.00% 00100.00% 1 1NANA job_201108242119_0002NORMALoracleselect count(*) from test(Stage-1)100.00% 11100.00% 1 1NANA job_201108242119_0003NORMALoracleselect count(*) from test(Stage-1)100.00% 11100.00% 1 1NANA job_201108242119_0004NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00% 11100.00% 3 3NANA job_201108242119_0005NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00% 11100.00% 3 3NANA job_201108242119_0006NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00% 11100.00% 3 3NANA At 2011-08-24 18:19:38,wd <[EMAIL PROTECTED]> wrote: >What about your total Map Task Capacity? >you may check it from http://your_jobtracker:50030/jobtracker.jsp > >2011/8/24 Daniel,Wu <[EMAIL PROTECTED]>: >> I checked my setting, all are with the default value.So per the book of >> "Hadoop the definitive guide", the split size should be 64M. And the file >> size is about 500M, so that's about 8 splits. And from the map job >> information (after the map job is done), I can see it gets 8 split from one >> node. But anyhow it starts only one map task. >> >> >> >> At 2011-08-24 02:28:18,"Aggarwal, Vaibhav" <[EMAIL PROTECTED]> wrote: >> >> If you actually have splittable files you can set the following setting to >> create more splits: >> >> >> >> mapred.max.split.size appropriately. >> >> >> >> Thanks >> >> Vaibhav >> >> >> >> From: Daniel,Wu [mailto:[EMAIL PROTECTED]] >> Sent: Tuesday, August 23, 2011 6:51 AM >> To: hive >> Subject: Why a sql only use one map task? >> >> >> >> I run the following simple sql >> select count(*) from sales; >> And the job information shows it only uses one map task. >> >> The underlying hadoop has 3 data/data nodes. So I expect hive should kick >> off 3 map tasks, one on each task nodes. What can make hive only run one map >> task? Do I need to set something to kick off multiple map task? in my >> config, I didn't change hive config. >> >> >> >>
-
Re:Re:Re: Re: RE: Why a sql only use one map task?Daniel,Wu 2011-08-25, 12:02
after I set
set mapred.min.split.size=200000000; Then it will kick off 3 map tasks (the file I have is 500M). So looks like we need to set mapred.min.split.size instead of mapred.map.tasks to control how many maps to kick off. At 2011-08-25 19:38:30,"Daniel,Wu" <[EMAIL PROTECTED]> wrote: It works, after I set as you said, but looks like I can't control the map task, it always use 9 maps, even if I set set mapred.map.tasks=2; Kind% CompleteNum TasksPendingRunningCompleteKilledFailed/Killed Task Attempts map100.00% 900900 / 0 reduce100.00% 100100 / 0 At 2011-08-25 06:35:38,"Ashutosh Chauhan" <[EMAIL PROTECTED]> wrote: This may be because CombineHiveInputFormat is combining your splits in one map task. If you don't want that to happen, do: hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveI nputFormat 2011/8/24 Daniel,Wu<[EMAIL PROTECTED]> I pasted the inform I pasted blow, the map capacity is 6. And no matter how I set mapred.map.tasks, such as 3, it doesn't work, as it always use 1 map task (please see the completed job information). Cluster Summary (Heap Size is 16.81 MB/966.69 MB) Running Map TasksRunning Reduce TasksTotal SubmissionsNodesOccupied Map SlotsOccupied Reduce SlotsReserved Map SlotsReserved Reduce SlotsMap Task CapacityReduce Task CapacityAvg. Tasks/NodeBlacklisted NodesExcluded Nodes 00630000664.0000 Completed Jobs JobidPriorityUserNameMap % CompleteMap TotalMaps CompletedReduce % CompleteReduce TotalReduces CompletedJob Scheduling InformationDiagnostic Info job_201108242119_0001NORMALoracleselect count(*) from test(Stage-1)100.00% 00100.00% 1 1NANA job_201108242119_0002NORMALoracleselect count(*) from test(Stage-1)100.00% 11100.00% 1 1NANA job_201108242119_0003NORMALoracleselect count(*) from test(Stage-1)100.00% 11100.00% 1 1NANA job_201108242119_0004NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00% 11100.00% 3 3NANA job_201108242119_0005NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00% 11100.00% 3 3NANA job_201108242119_0006NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00% 11100.00% 3 3NANA At 2011-08-24 18:19:38,wd <[EMAIL PROTECTED]> wrote: >What about your total Map Task Capacity? >you may check it from http://your_jobtracker:50030/jobtracker.jsp > >2011/8/24 Daniel,Wu <[EMAIL PROTECTED]>: >> I checked my setting, all are with the default value.So per the book of >> "Hadoop the definitive guide", the split size should be 64M. And the file >> size is about 500M, so that's about 8 splits. And from the map job >> information (after the map job is done), I can see it gets 8 split from one >> node. But anyhow it starts only one map task. >> >> >> >> At 2011-08-24 02:28:18,"Aggarwal, Vaibhav" <[EMAIL PROTECTED]> wrote: >> >> If you actually have splittable files you can set the following setting to >> create more splits: >> >> >> >> mapred.max.split.size appropriately. >> >> >> >> Thanks >> >> Vaibhav >> >> >> >> From: Daniel,Wu [mailto:[EMAIL PROTECTED]] >> Sent: Tuesday, August 23, 2011 6:51 AM >> To: hive >> Subject: Why a sql only use one map task? >> >> >> >> I run the following simple sql >> select count(*) from sales; >> And the job information shows it only uses one map task. >> >> The underlying hadoop has 3 data/data nodes. So I expect hive should kick >> off 3 map tasks, one on each task nodes. What can make hive only run one map >> task? Do I need to set something to kick off multiple map task? in my >> config, I didn't change hive config. >> >> >> >>
-
Re: Re:Re: Re: RE: Why a sql only use one map task?bejoy_ks@... 2011-08-25, 14:51
Hi Daniel
In the hadoop eco system the number of map tasks is actually decided by the job basically based no of input splits . Setting mapred.map.tasks wouldn't assure that only that many number of map tasks are triggered. What worked out here for you is that you were specifying that a map tasks should process a min data volume by setting value for mapred.min.split size. So in your case in real there were 9 input splits but when you imposed a constrain on the min data that a map task should handle, the map tasks came down to 3. Regards Bejoy K S -----Original Message----- From: "Daniel,Wu" <[EMAIL PROTECTED]> Date: Thu, 25 Aug 2011 20:02:43 To: <[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Subject: Re:Re:Re: Re: RE: Why a sql only use one map task? after I set set mapred.min.split.size=200000000; Then it will kick off 3 map tasks (the file I have is 500M). So looks like we need to set mapred.min.split.size instead of mapred.map.tasks to control how many maps to kick off. At 2011-08-25 19:38:30,"Daniel,Wu" <[EMAIL PROTECTED]> wrote: It works, after I set as you said, but looks like I can't control the map task, it always use 9 maps, even if I set set mapred.map.tasks=2; Kind% CompleteNum TasksPendingRunningCompleteKilledFailed/Killed Task Attempts map100.00% 900900 / 0 reduce100.00% 100100 / 0 At 2011-08-25 06:35:38,"Ashutosh Chauhan" <[EMAIL PROTECTED]> wrote: This may be because CombineHiveInputFormat is combining your splits in one map task. If you don't want that to happen, do: hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveI nputFormat 2011/8/24 Daniel,Wu<[EMAIL PROTECTED]> I pasted the inform I pasted blow, the map capacity is 6. And no matter how I set mapred.map.tasks, such as 3, it doesn't work, as it always use 1 map task (please see the completed job information). Cluster Summary (Heap Size is 16.81 MB/966.69 MB) Running Map TasksRunning Reduce TasksTotal SubmissionsNodesOccupied Map SlotsOccupied Reduce SlotsReserved Map SlotsReserved Reduce SlotsMap Task CapacityReduce Task CapacityAvg. Tasks/NodeBlacklisted NodesExcluded Nodes 00630000664.0000 Completed Jobs JobidPriorityUserNameMap % CompleteMap TotalMaps CompletedReduce % CompleteReduce TotalReduces CompletedJob Scheduling InformationDiagnostic Info job_201108242119_0001NORMALoracleselect count(*) from test(Stage-1)100.00% 00100.00% 1 1NANA job_201108242119_0002NORMALoracleselect count(*) from test(Stage-1)100.00% 11100.00% 1 1NANA job_201108242119_0003NORMALoracleselect count(*) from test(Stage-1)100.00% 11100.00% 1 1NANA job_201108242119_0004NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00% 11100.00% 3 3NANA job_201108242119_0005NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00% 11100.00% 3 3NANA job_201108242119_0006NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00% 11100.00% 3 3NANA At 2011-08-24 18:19:38,wd <[EMAIL PROTECTED]> wrote: >What about your total Map Task Capacity? >you may check it from http://your_jobtracker:50030/jobtracker.jsp > >2011/8/24 Daniel,Wu <[EMAIL PROTECTED]>: >> I checked my setting, all are with the default value.So per the book of >> "Hadoop the definitive guide", the split size should be 64M. And the file >> size is about 500M, so that's about 8 splits. And from the map job >> information (after the map job is done), I can see it gets 8 split from one >> node. But anyhow it starts only one map task. >> >> >> >> At 2011-08-24 02:28:18,"Aggarwal, Vaibhav" <[EMAIL PROTECTED]> wrote: >> >> If you actually have splittable files you can set the following setting to >> create more splits: >> >> >> >> mapred.max.split.size appropriately. >> >> >> >> Thanks >> >> Vaibhav >> >> >> >> From: Daniel,Wu [mailto:[EMAIL PROTECTED]] >> Sent: Tuesday, August 23, 2011 6:51 AM >> To: hive >> Subject: Why a sql only use one map task? >> >> >> >> I run the following simple sql >> select count(*) from sales; |