Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Why a sql only use one map task?


+
Daniel,Wu 2011-08-23, 13:51
+
Vikas Srivastava 2011-08-23, 13:58
+
Daniel,Wu 2011-08-23, 14:16
+
Aggarwal, Vaibhav 2011-08-23, 18:28
+
Daniel,Wu 2011-08-24, 06:43
+
wd 2011-08-24, 10:19
+
Daniel,Wu 2011-08-24, 13:39
+
Ashutosh Chauhan 2011-08-24, 22:35
+
Daniel,Wu 2011-08-25, 11:38
+
Daniel,Wu 2011-08-25, 12:02
Copy link to this message
-
Re: Re:Re: Re: RE: Why a sql only use one map task?
Hi Daniel
         In the hadoop eco system the number of map tasks is actually decided by the job basically based  no of input splits . Setting mapred.map.tasks wouldn't assure that only that many number of map tasks are triggered. What worked out here for you is that you were specifying that a map tasks should process a min data volume by setting value for mapred.min.split size.
 So in your case in real there were 9 input splits but when you imposed a constrain on the min data that a map task should handle, the map tasks came down to 3.
Regards
Bejoy K S

-----Original Message-----
From: "Daniel,Wu" <[EMAIL PROTECTED]>
Date: Thu, 25 Aug 2011 20:02:43
To: <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Subject: Re:Re:Re: Re: RE: Why a sql only use one map task?

after I set
set mapred.min.split.size=200000000;

Then it will kick off 3 map tasks (the file I have is 500M).  So looks like we need to set mapred.min.split.size instead of mapred.map.tasks to control how many maps to kick off.
At 2011-08-25 19:38:30,"Daniel,Wu" <[EMAIL PROTECTED]> wrote:

It works, after I set as you said, but looks like I can't control the map task, it always use 9 maps, even if I set
set mapred.map.tasks=2;
Kind% CompleteNum TasksPendingRunningCompleteKilledFailed/Killed
Task Attempts
map100.00%
900900 / 0
reduce100.00%
100100 / 0

At 2011-08-25 06:35:38,"Ashutosh Chauhan" <[EMAIL PROTECTED]> wrote:
This may be because CombineHiveInputFormat is combining your splits in one map task. If you don't want that to happen, do:
hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveI nputFormat
2011/8/24 Daniel,Wu<[EMAIL PROTECTED]>

I pasted the inform I pasted blow, the map capacity is 6. And no matter how I set  mapred.map.tasks, such as 3,  it doesn't work, as it always use 1 map task (please see the completed job information).

Cluster Summary (Heap Size is 16.81 MB/966.69 MB)
Running Map TasksRunning Reduce TasksTotal SubmissionsNodesOccupied Map SlotsOccupied Reduce SlotsReserved Map SlotsReserved Reduce SlotsMap Task CapacityReduce Task CapacityAvg. Tasks/NodeBlacklisted NodesExcluded Nodes
00630000664.0000
Completed Jobs
JobidPriorityUserNameMap % CompleteMap TotalMaps CompletedReduce % CompleteReduce TotalReduces CompletedJob Scheduling InformationDiagnostic Info
job_201108242119_0001NORMALoracleselect count(*) from test(Stage-1)100.00%
00100.00%
1 1NANA
job_201108242119_0002NORMALoracleselect count(*) from test(Stage-1)100.00%
11100.00%
1 1NANA
job_201108242119_0003NORMALoracleselect count(*) from test(Stage-1)100.00%
11100.00%
1 1NANA
job_201108242119_0004NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00%
11100.00%
3 3NANA
job_201108242119_0005NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00%
11100.00%
3 3NANA
job_201108242119_0006NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00%
11100.00%
3 3NANA

At 2011-08-24 18:19:38,wd <[EMAIL PROTECTED]> wrote:
>What about your total Map Task Capacity?
>you may check it from http://your_jobtracker:50030/jobtracker.jsp

>
>2011/8/24 Daniel,Wu <[EMAIL PROTECTED]>:
>> I checked my setting, all are with the default value.So per the book of
>> "Hadoop the definitive guide", the split size should be 64M. And the file
>> size is about 500M, so that's about 8 splits. And from the map job
>> information (after the map job is done), I can see it gets 8 split from one
>> node. But anyhow it starts only one map task.
>>
>>
>>
>> At 2011-08-24 02:28:18,"Aggarwal, Vaibhav" <[EMAIL PROTECTED]> wrote:
>>
>> If you actually have splittable files you can set the following setting to
>> create more splits:
>>
>>
>>
>> mapred.max.split.size appropriately.
>>
>>
>>
>> Thanks
>>
>> Vaibhav
>>
>>
>>
>> From: Daniel,Wu [mailto:[EMAIL PROTECTED]]
>> Sent: Tuesday, August 23, 2011 6:51 AM
>> To: hive
>> Subject: Why a sql only use one map task?
>>
>>
>>
>>   I run the following simple sql
>> select count(*) from sales;
+
Steven Wong 2011-08-24, 23:01
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB