-Re: Hive mapper creation
Mohammad Tariq 2012-06-28, 19:25
Thanks a lot for the valuable response Bejoy. Actually I wanted to
know if it is possible to set the size of filesplits or the criterion
on which filesplits are created (in turn controlling the creation of
mappers) for a Hive query. For example, If I want to take 'n' lines
from a file as one split instead of taking each individual row, I can
use nlineinput format.Is it possible to do something similar at Hive's
level or do I need to look into the source code??
On Fri, Jun 29, 2012 at 12:37 AM, Bejoy KS <[EMAIL PROTECTED]> wrote:
> Hi Mohammed
> Splits are associated with MapReduce framework and not necessarily with hive. It is the data processed by a mapper. Based on your InputFormat, min and max split size properties MR framework considers hdfs blocks that a mapper should process.( It can be just one block or more if CombineFileInputFormat is used.) This choice of which all hdfs blocks forms a split is determined under the consideration of data locality. Number of mappers/map tasks created by a job is equal to the number of splits thus determined. ie one map task per split.
> Hope it is clear. Feel free to revert if you still have any queries.
> Bejoy KS
> Sent from handheld, please excuse typos.
> -----Original Message-----
> From: Mohammad Tariq <[EMAIL PROTECTED]>
> Date: Fri, 29 Jun 2012 00:29:13
> To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> Reply-To: [EMAIL PROTECTED]
> Subject: Re: Hive mapper creation
> Hello Nitin, Bejoy,
> Thanks a lot for the quick response. Could you please tell me
> what is the default criterion of split creation??How the splits for a
> Hive query are created??(Pardon my ignorance).
> Mohammad Tariq
> On Fri, Jun 29, 2012 at 12:22 AM, Bejoy KS <[EMAIL PROTECTED]> wrote:
>> Hi Mohammed
>> Internally In hive the processing is done using MapReduce. So like in mapreduce the splits are calculated on job submission and a mapper is assigned per split. So a mapper ideally process a split and not a row.
>> You can store data in various formats as text, sequence files, RC files etc. No restriction just on text files.
>> Bejoy KS
>> Sent from handheld, please excuse typos.
>> -----Original Message-----
>> From: Mohammad Tariq <[EMAIL PROTECTED]>
>> Date: Fri, 29 Jun 2012 00:17:05
>> To: user<[EMAIL PROTECTED]>
>> Reply-To: [EMAIL PROTECTED]
>> Subject: Hive mapper creation
>> Hello list,
>> Since Hive tables are assumed to be of text input format, is
>> it right to assume that a mapper is created per row of a particular
>> table??Please correct me if my understanding is wrong. Also let me
>> know how mappers are created corresponding to a Hive query. Many
>> Mohammad Tariq