Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Re: mapper combiner and partitioner for particular dataset


Copy link to this message
-
Re: mapper combiner and partitioner for particular dataset
Mahesh Balija 2013-03-05, 08:05
What Harsh means by that is, you should create a custom partitioner which
should take care of partitioning the records based on the input record data
(Key, Value). i.e., if you have multiple inputs from multiple mappers each
might generate a key, value pair you should have something specific in your
key/value which can be useful to figure out, that which dataset it is
coming from (if your value is a Text, then value dataset1+value,
dataset2+value etc). Using this info in your partitioner you can either
write mulitple Partitioner implementations or simply one partitioner
handling all different cases.

Harsh, please correct me if I am wrong.

Best,
Mahesh Balija,
Calsoft Labs.

On Mon, Mar 4, 2013 at 8:32 PM, Vikas Jadhav <[EMAIL PROTECTED]>wrote:

> Thank You for reply
>
> Can u please elaborate because i am not getting wht does following means
> in programming enviornment
>
>
> you will need a custom written "high level" partitioner and combiner that
> can create multiple instances of sub-partitioners/combiners and use the
> most likely one based on their input's characteristics (such as instance
> type, some tag, config., etc.).
>
>
>
> On Sun, Mar 3, 2013 at 4:58 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>
>> The MultipleInputs class only supports mapper configuration per dataset.
>> It does not let you specify a partitioner and combiner as well. You will
>> need a custom written "high level" partitioner and combiner that can create
>> multiple instances of sub-partitioners/combiners and use the most likely
>> one based on their input's characteristics (such as instance type, some
>> tag, config., etc.).
>>
>>
>> On Sun, Mar 3, 2013 at 4:07 PM, Vikas Jadhav <[EMAIL PROTECTED]>wrote:
>>
>>>
>>>
>>>
>>>
>>> Hello
>>>
>>> 1)  I have multiple types of datasets as input to my hadoop job
>>>
>>> i want write my own inputformat (Exa. MyTableInputformat)
>>>   and how to specify mapper partitioner combiner per dataset manner
>>>  I know MultiFileInputFormat class but if i want to asscoite combiner
>>> and partitioner class
>>> it wont help. it only sets mapper class for per dataset manner.
>>>
>>> 2)  Also i am looking MapTask.java file from source code
>>>
>>> just want to know where does mapper partitioner and combiner classes are
>>> set for particular filesplit
>>> while executing job
>>>
>>> Thank You
>>>
>>> --
>>> *
>>> *
>>> *
>>>
>>>  Thanx and Regards*
>>> * Vikas Jadhav*
>>>
>>>
>>>
>>> --
>>> *
>>> *
>>> *
>>>
>>> Thanx and Regards*
>>> * Vikas Jadhav*
>>>
>>
>>
>>
>> --
>> Harsh J
>>
>
>
>
> --
> *
> *
> *
>
> Thanx and Regards*
> * Vikas Jadhav*
>