Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> MapReduce processing with extra (possibly non-serializable) configuration


+
Public Network Services 2013-02-21, 21:10
Copy link to this message
-
Re: MapReduce processing with extra (possibly non-serializable) configuration
I just have one simple suggestion for you: writer an customer split to
replace FileSplit, include all your special configurations in this split.
then write a custom InputFormat.

during map phrase, you can get this split, then you get all special
configurations.

On Fri, Feb 22, 2013 at 5:10 AM, Public Network Services <
[EMAIL PROTECTED]> wrote:

> Hi...
>
> I am trying to put an existing file processing application into Hadoop and
> need to find the best way of propagating some extra configuration per
> split, in the form of complex and proprietary custom Java objects.
>
> The general idea is
>
>    1. A custom InputFormat splits the input data
>    2. The same InputFormat prepares the appropriate configuration for
>    each split
>    3. Hadoop processes each split in MapReduce, using the split itself
>    and the corresponding configuration
>
> The problem is that these configuration objects contain a lot of
> properties and references to other complex objects, and so on, therefore it
> will take a lot of work to cover all the possible combinations and make the
> whole thing serializable (if it can be done in the first place).
>
> Most probably this is the only way forward, but if anyone has ever dealt
> with this problem, please suggest the best approach to follow.
>
> Thanks!
>
>
+
Public Network Services 2013-02-22, 04:11
+
feng lu 2013-02-22, 01:55
+
Public Network Services 2013-02-22, 04:09
+
Harsh J 2013-02-22, 06:15
+
Public Network Services 2013-02-22, 06:26