Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Mapper basic question


Copy link to this message
-
Re: Mapper basic question
Manoj Babu 2012-07-11, 15:02
Thanks All!
 On 11 Jul 2012 19:07, "Bejoy KS" <[EMAIL PROTECTED]> wrote:

> **
> Hi Manoj
>
> Block size is in hdfs storage level where as split size is the amount of
> data processed by each mapper while running a map reduce job(One split is
> the data processed by one mapper). One or more hdfs blocks can contribute a
> split. Splits are determined by the InputFormat as well as the min and max
> split size properties.
>
> As Arun mentioned use CombineFileInputFormat and adjust the min and max
> split size properties to control/limit the number of mappers.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ------------------------------
> *From: * Manoj Babu <[EMAIL PROTECTED]>
> *Date: *Wed, 11 Jul 2012 18:17:41 +0530
> *To: *<[EMAIL PROTECTED]>
> *ReplyTo: * [EMAIL PROTECTED]
> *Subject: *Re: Mapper basic question
>
> Hi  Tariq \Arun,
>
> The no of blocks(splits) = *total no of file size/hdfs block size *
> replicate value*
> The no of splits is again nothing but the blocks here.
>
> Other than increasing the block size(input splits) is it possible to limit
> that no of mappers?
>
>
> Cheers!
> Manoj.
>
>
>
> On Wed, Jul 11, 2012 at 6:06 PM, Arun C Murthy <[EMAIL PROTECTED]>wrote:
>
>> Take a look at CombineFileInputFormat - this will create 'meta splits'
>> which include multiple small spilts, thus reducing #maps which are run.
>>
>> Arun
>>
>> On Jul 11, 2012, at 5:29 AM, Manoj Babu wrote:
>>
>> Hi,
>>
>> The no of mappers is depends on the no of blocks. Is it possible to limit
>> the no of mappers size without increasing the HDFS block size?
>>
>> Thanks in advance.
>>
>> Cheers!
>> Manoj.
>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>