Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Re: mapper combiner and partitioner for particular dataset


Copy link to this message
-
Re: mapper combiner and partitioner for particular dataset
Vikas Jadhav 2013-03-06, 09:38
got it
Thanx Mahesh.

On Tue, Mar 5, 2013 at 1:35 PM, Mahesh Balija <[EMAIL PROTECTED]>wrote:

> What Harsh means by that is, you should create a custom partitioner which
> should take care of partitioning the records based on the input record data
> (Key, Value). i.e., if you have multiple inputs from multiple mappers each
> might generate a key, value pair you should have something specific in your
> key/value which can be useful to figure out, that which dataset it is
> coming from (if your value is a Text, then value dataset1+value,
> dataset2+value etc). Using this info in your partitioner you can either
> write mulitple Partitioner implementations or simply one partitioner
> handling all different cases.
>
> Harsh, please correct me if I am wrong.
>
> Best,
> Mahesh Balija,
> Calsoft Labs.
>
>
> On Mon, Mar 4, 2013 at 8:32 PM, Vikas Jadhav <[EMAIL PROTECTED]>wrote:
>
>> Thank You for reply
>>
>> Can u please elaborate because i am not getting wht does following means
>> in programming enviornment
>>
>>
>> you will need a custom written "high level" partitioner and combiner that
>> can create multiple instances of sub-partitioners/combiners and use the
>> most likely one based on their input's characteristics (such as instance
>> type, some tag, config., etc.).
>>
>>
>>
>> On Sun, Mar 3, 2013 at 4:58 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>>> The MultipleInputs class only supports mapper configuration per dataset.
>>> It does not let you specify a partitioner and combiner as well. You will
>>> need a custom written "high level" partitioner and combiner that can create
>>> multiple instances of sub-partitioners/combiners and use the most likely
>>> one based on their input's characteristics (such as instance type, some
>>> tag, config., etc.).
>>>
>>>
>>> On Sun, Mar 3, 2013 at 4:07 PM, Vikas Jadhav <[EMAIL PROTECTED]>wrote:
>>>
>>>>
>>>>
>>>>
>>>>
>>>> Hello
>>>>
>>>> 1)  I have multiple types of datasets as input to my hadoop job
>>>>
>>>> i want write my own inputformat (Exa. MyTableInputformat)
>>>>   and how to specify mapper partitioner combiner per dataset manner
>>>>  I know MultiFileInputFormat class but if i want to asscoite combiner
>>>> and partitioner class
>>>> it wont help. it only sets mapper class for per dataset manner.
>>>>
>>>> 2)  Also i am looking MapTask.java file from source code
>>>>
>>>> just want to know where does mapper partitioner and combiner classes
>>>> are set for particular filesplit
>>>> while executing job
>>>>
>>>> Thank You
>>>>
>>>> --
>>>> *
>>>> *
>>>> *
>>>>
>>>>  Thanx and Regards*
>>>> * Vikas Jadhav*
>>>>
>>>>
>>>>
>>>> --
>>>> *
>>>> *
>>>> *
>>>>
>>>> Thanx and Regards*
>>>> * Vikas Jadhav*
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>>
>> --
>> *
>> *
>> *
>>
>> Thanx and Regards*
>> * Vikas Jadhav*
>>
>
>
--
*
*
*

Thanx and Regards*
* Vikas Jadhav*