Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> How does hive decide to launch how many map tasks?


+
Cheng Su 2012-11-16, 06:39
Copy link to this message
-
Re: How does hive decide to launch how many map tasks?
Hi Chen

The computation on the number of Input Splits/ map tasks is totally determined by the InputFormat used as well as the split size.

Hive used CombineHiveInput format so you may not be having one mapper per file if your files are small. You can control the number of maps by controlling the split sizes.
Mapred.min.split.size
Mapred.max.split.size

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Cheng Su <[EMAIL PROTECTED]>
Date: Fri, 16 Nov 2012 14:39:57
To: <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Subject: How does hive decide to launch how many map tasks?

Hi, all

How does hive decide to launch how many map tasks?
I know there are some configs to help hive to decide how many reduce
task to launch?
But how about map tasks?

I thought that number of map tasks equals to the number of the store files.
I have a table now with 2 partitions, and one has 4 files in it, the
other has 2,
when I execute "select count(*) from table", only one map is launched.

How can I increase the number of map tasks to improve the performance?

Thanks.

--

Regards,
Cheng Su
+
Cheng Su 2012-11-16, 12:08
+
Cheng Su 2012-11-19, 07:01
+
Bejoy KS 2012-11-19, 09:09
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB