Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Hive mapper creation


Copy link to this message
-
Re: Hive mapper creation
Thanks a lot for the valuable response Bejoy. Actually I wanted to
know if it is possible to set the size of filesplits or the criterion
on which filesplits are created (in turn controlling the creation of
mappers) for a Hive query. For example, If I want to take 'n' lines
from a file as one split instead of taking each individual row, I can
use nlineinput format.Is it possible to do something similar at Hive's
level or do I need to look into the source code??

Regards,
    Mohammad Tariq
On Fri, Jun 29, 2012 at 12:37 AM, Bejoy KS <[EMAIL PROTECTED]> wrote:
> Hi Mohammed
>
> Splits are associated with MapReduce framework and not necessarily with hive. It is the data processed by a mapper. Based on your InputFormat, min and max split size properties MR framework considers hdfs blocks that a mapper should process.( It can be just one block or more if CombineFileInputFormat is used.) This choice of which all hdfs blocks forms a split is determined under the consideration of data locality. Number of mappers/map tasks created by a job is equal to the number of splits thus determined. ie one map task per split.
>
> Hope it is clear. Feel free to revert if you still have any queries.
>
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> -----Original Message-----
> From: Mohammad Tariq <[EMAIL PROTECTED]>
> Date: Fri, 29 Jun 2012 00:29:13
> To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> Reply-To: [EMAIL PROTECTED]
> Subject: Re: Hive mapper creation
>
> Hello Nitin, Bejoy,
>
>        Thanks a lot for the quick response. Could you please tell me
> what is the default criterion of split creation??How the splits for a
> Hive query are created??(Pardon my ignorance).
>
> Regards,
>     Mohammad Tariq
>
>
> On Fri, Jun 29, 2012 at 12:22 AM, Bejoy KS <[EMAIL PROTECTED]> wrote:
>> Hi Mohammed
>>
>> Internally In hive the processing is done using MapReduce. So like in mapreduce the splits are calculated on job submission and a mapper is assigned per split. So a mapper ideally process a split and not a row.
>>
>> You can store data in various formats as text, sequence files, RC files etc. No restriction just on text files.
>>
>>
>> Regards
>> Bejoy KS
>>
>> Sent from handheld, please excuse typos.
>>
>> -----Original Message-----
>> From: Mohammad Tariq <[EMAIL PROTECTED]>
>> Date: Fri, 29 Jun 2012 00:17:05
>> To: user<[EMAIL PROTECTED]>
>> Reply-To: [EMAIL PROTECTED]
>> Subject: Hive mapper creation
>>
>> Hello list,
>>
>>         Since Hive tables are assumed to be of text input format, is
>> it right to assume that a mapper is created per row of a particular
>> table??Please correct me if my understanding is wrong. Also let me
>> know how mappers are created corresponding to a Hive query. Many
>> thanks.
>>
>> Regards,
>>     Mohammad Tariq