Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - Hive mapper creation


+
Mohammad Tariq 2012-06-28, 18:47
+
Bejoy KS 2012-06-28, 18:52
+
Mohammad Tariq 2012-06-28, 18:59
+
Bejoy KS 2012-06-28, 19:07
+
Mohammad Tariq 2012-06-28, 19:25
+
Bejoy KS 2012-06-28, 19:29
Copy link to this message
-
Re: Hive mapper creation
Mohammad Tariq 2012-06-28, 19:35
Ok Bejoy. I'll proceed as directed by you and get back to you in case
of any difficulty. Thanks again for the help.

Regards,
    Mohammad Tariq
On Fri, Jun 29, 2012 at 12:59 AM, Bejoy KS <[EMAIL PROTECTED]> wrote:
>  Hi Mohammed
>
> If it is to control the split size and there by the number of map tasks, you just need to play with min and max split size properties.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> -----Original Message-----
> From: Mohammad Tariq <[EMAIL PROTECTED]>
> Date: Fri, 29 Jun 2012 00:55:54
> To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> Reply-To: [EMAIL PROTECTED]
> Subject: Re: Hive mapper creation
>
> Thanks a lot for the valuable response Bejoy. Actually I wanted to
> know if it is possible to set the size of filesplits or the criterion
> on which filesplits are created (in turn controlling the creation of
> mappers) for a Hive query. For example, If I want to take 'n' lines
> from a file as one split instead of taking each individual row, I can
> use nlineinput format.Is it possible to do something similar at Hive's
> level or do I need to look into the source code??
>
> Regards,
>     Mohammad Tariq
>
>
> On Fri, Jun 29, 2012 at 12:37 AM, Bejoy KS <[EMAIL PROTECTED]> wrote:
>> Hi Mohammed
>>
>> Splits are associated with MapReduce framework and not necessarily with hive. It is the data processed by a mapper. Based on your InputFormat, min and max split size properties MR framework considers hdfs blocks that a mapper should process.( It can be just one block or more if CombineFileInputFormat is used.) This choice of which all hdfs blocks forms a split is determined under the consideration of data locality. Number of mappers/map tasks created by a job is equal to the number of splits thus determined. ie one map task per split.
>>
>> Hope it is clear. Feel free to revert if you still have any queries.
>>
>>
>> Regards
>> Bejoy KS
>>
>> Sent from handheld, please excuse typos.
>>
>> -----Original Message-----
>> From: Mohammad Tariq <[EMAIL PROTECTED]>
>> Date: Fri, 29 Jun 2012 00:29:13
>> To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
>> Reply-To: [EMAIL PROTECTED]
>> Subject: Re: Hive mapper creation
>>
>> Hello Nitin, Bejoy,
>>
>>        Thanks a lot for the quick response. Could you please tell me
>> what is the default criterion of split creation??How the splits for a
>> Hive query are created??(Pardon my ignorance).
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>> On Fri, Jun 29, 2012 at 12:22 AM, Bejoy KS <[EMAIL PROTECTED]> wrote:
>>> Hi Mohammed
>>>
>>> Internally In hive the processing is done using MapReduce. So like in mapreduce the splits are calculated on job submission and a mapper is assigned per split. So a mapper ideally process a split and not a row.
>>>
>>> You can store data in various formats as text, sequence files, RC files etc. No restriction just on text files.
>>>
>>>
>>> Regards
>>> Bejoy KS
>>>
>>> Sent from handheld, please excuse typos.
>>>
>>> -----Original Message-----
>>> From: Mohammad Tariq <[EMAIL PROTECTED]>
>>> Date: Fri, 29 Jun 2012 00:17:05
>>> To: user<[EMAIL PROTECTED]>
>>> Reply-To: [EMAIL PROTECTED]
>>> Subject: Hive mapper creation
>>>
>>> Hello list,
>>>
>>>         Since Hive tables are assumed to be of text input format, is
>>> it right to assume that a mapper is created per row of a particular
>>> table??Please correct me if my understanding is wrong. Also let me
>>> know how mappers are created corresponding to a Hive query. Many
>>> thanks.
>>>
>>> Regards,
>>>     Mohammad Tariq
+
Nitin Pawar 2012-06-28, 18:51