Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: mapper combiner and partitioner for particular dataset


Copy link to this message
-
Re: mapper combiner and partitioner for particular dataset
got it
Thanx Mahesh.

On Tue, Mar 5, 2013 at 1:35 PM, Mahesh Balija <[EMAIL PROTECTED]>wrote:

> What Harsh means by that is, you should create a custom partitioner which
> should take care of partitioning the records based on the input record data
> (Key, Value). i.e., if you have multiple inputs from multiple mappers each
> might generate a key, value pair you should have something specific in your
> key/value which can be useful to figure out, that which dataset it is
> coming from (if your value is a Text, then value dataset1+value,
> dataset2+value etc). Using this info in your partitioner you can either
> write mulitple Partitioner implementations or simply one partitioner
> handling all different cases.
>
> Harsh, please correct me if I am wrong.
>
> Best,
> Mahesh Balija,
> Calsoft Labs.
>
>
> On Mon, Mar 4, 2013 at 8:32 PM, Vikas Jadhav <[EMAIL PROTECTED]>wrote:
>
>> Thank You for reply
>>
>> Can u please elaborate because i am not getting wht does following means
>> in programming enviornment
>>
>>
>> you will need a custom written "high level" partitioner and combiner that
>> can create multiple instances of sub-partitioners/combiners and use the
>> most likely one based on their input's characteristics (such as instance
>> type, some tag, config., etc.).
>>
>>
>>
>> On Sun, Mar 3, 2013 at 4:58 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>>> The MultipleInputs class only supports mapper configuration per dataset.
>>> It does not let you specify a partitioner and combiner as well. You will
>>> need a custom written "high level" partitioner and combiner that can create
>>> multiple instances of sub-partitioners/combiners and use the most likely
>>> one based on their input's characteristics (such as instance type, some
>>> tag, config., etc.).
>>>
>>>
>>> On Sun, Mar 3, 2013 at 4:07 PM, Vikas Jadhav <[EMAIL PROTECTED]>wrote:
>>>
>>>>
>>>>
>>>>
>>>>
>>>> Hello
>>>>
>>>> 1)  I have multiple types of datasets as input to my hadoop job
>>>>
>>>> i want write my own inputformat (Exa. MyTableInputformat)
>>>>   and how to specify mapper partitioner combiner per dataset manner
>>>>  I know MultiFileInputFormat class but if i want to asscoite combiner
>>>> and partitioner class
>>>> it wont help. it only sets mapper class for per dataset manner.
>>>>
>>>> 2)  Also i am looking MapTask.java file from source code
>>>>
>>>> just want to know where does mapper partitioner and combiner classes
>>>> are set for particular filesplit
>>>> while executing job
>>>>
>>>> Thank You
>>>>
>>>> --
>>>> *
>>>> *
>>>> *
>>>>
>>>>  Thanx and Regards*
>>>> * Vikas Jadhav*
>>>>
>>>>
>>>>
>>>> --
>>>> *
>>>> *
>>>> *
>>>>
>>>> Thanx and Regards*
>>>> * Vikas Jadhav*
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>>
>> --
>> *
>> *
>> *
>>
>> Thanx and Regards*
>> * Vikas Jadhav*
>>
>
>
--
*
*
*

Thanx and Regards*
* Vikas Jadhav*
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB