Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Why a sql only use one map task?


Copy link to this message
-
Re: Re:Re: Re: RE: Why a sql only use one map task?
Hi Daniel
         In the hadoop eco system the number of map tasks is actually decided by the job basically based  no of input splits . Setting mapred.map.tasks wouldn't assure that only that many number of map tasks are triggered. What worked out here for you is that you were specifying that a map tasks should process a min data volume by setting value for mapred.min.split size.
 So in your case in real there were 9 input splits but when you imposed a constrain on the min data that a map task should handle, the map tasks came down to 3.
Regards
Bejoy K S

-----Original Message-----
From: "Daniel,Wu" <[EMAIL PROTECTED]>
Date: Thu, 25 Aug 2011 20:02:43
To: <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Subject: Re:Re:Re: Re: RE: Why a sql only use one map task?

after I set
set mapred.min.split.size=200000000;

Then it will kick off 3 map tasks (the file I have is 500M).  So looks like we need to set mapred.min.split.size instead of mapred.map.tasks to control how many maps to kick off.
At 2011-08-25 19:38:30,"Daniel,Wu" <[EMAIL PROTECTED]> wrote:

It works, after I set as you said, but looks like I can't control the map task, it always use 9 maps, even if I set
set mapred.map.tasks=2;
Kind% CompleteNum TasksPendingRunningCompleteKilledFailed/Killed
Task Attempts
map100.00%
900900 / 0
reduce100.00%
100100 / 0

At 2011-08-25 06:35:38,"Ashutosh Chauhan" <[EMAIL PROTECTED]> wrote:
This may be because CombineHiveInputFormat is combining your splits in one map task. If you don't want that to happen, do:
hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveI nputFormat
2011/8/24 Daniel,Wu<[EMAIL PROTECTED]>

I pasted the inform I pasted blow, the map capacity is 6. And no matter how I set  mapred.map.tasks, such as 3,  it doesn't work, as it always use 1 map task (please see the completed job information).

Cluster Summary (Heap Size is 16.81 MB/966.69 MB)
Running Map TasksRunning Reduce TasksTotal SubmissionsNodesOccupied Map SlotsOccupied Reduce SlotsReserved Map SlotsReserved Reduce SlotsMap Task CapacityReduce Task CapacityAvg. Tasks/NodeBlacklisted NodesExcluded Nodes
00630000664.0000
Completed Jobs
JobidPriorityUserNameMap % CompleteMap TotalMaps CompletedReduce % CompleteReduce TotalReduces CompletedJob Scheduling InformationDiagnostic Info
job_201108242119_0001NORMALoracleselect count(*) from test(Stage-1)100.00%
00100.00%
1 1NANA
job_201108242119_0002NORMALoracleselect count(*) from test(Stage-1)100.00%
11100.00%
1 1NANA
job_201108242119_0003NORMALoracleselect count(*) from test(Stage-1)100.00%
11100.00%
1 1NANA
job_201108242119_0004NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00%
11100.00%
3 3NANA
job_201108242119_0005NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00%
11100.00%
3 3NANA
job_201108242119_0006NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00%
11100.00%
3 3NANA

At 2011-08-24 18:19:38,wd <[EMAIL PROTECTED]> wrote:
>What about your total Map Task Capacity?
>you may check it from http://your_jobtracker:50030/jobtracker.jsp

>
>2011/8/24 Daniel,Wu <[EMAIL PROTECTED]>:
>> I checked my setting, all are with the default value.So per the book of
>> "Hadoop the definitive guide", the split size should be 64M. And the file
>> size is about 500M, so that's about 8 splits. And from the map job
>> information (after the map job is done), I can see it gets 8 split from one
>> node. But anyhow it starts only one map task.
>>
>>
>>
>> At 2011-08-24 02:28:18,"Aggarwal, Vaibhav" <[EMAIL PROTECTED]> wrote:
>>
>> If you actually have splittable files you can set the following setting to
>> create more splits:
>>
>>
>>
>> mapred.max.split.size appropriately.
>>
>>
>>
>> Thanks
>>
>> Vaibhav
>>
>>
>>
>> From: Daniel,Wu [mailto:[EMAIL PROTECTED]]
>> Sent: Tuesday, August 23, 2011 6:51 AM
>> To: hive
>> Subject: Why a sql only use one map task?
>>
>>
>>
>>   I run the following simple sql
>> select count(*) from sales;