-MapReduce processing with extra (possibly non-serializable) configuration
Public Network Services 2013-02-21, 21:10
I am trying to put an existing file processing application into Hadoop and
need to find the best way of propagating some extra configuration per
split, in the form of complex and proprietary custom Java objects.
The general idea is
1. A custom InputFormat splits the input data
2. The same InputFormat prepares the appropriate configuration for each
3. Hadoop processes each split in MapReduce, using the split itself and
the corresponding configuration
The problem is that these configuration objects contain a lot of properties
and references to other complex objects, and so on, therefore it will take
a lot of work to cover all the possible combinations and make the whole
thing serializable (if it can be done in the first place).
Most probably this is the only way forward, but if anyone has ever dealt
with this problem, please suggest the best approach to follow.
Azuryy Yu 2013-02-22, 01:57
Public Network Services 2013-02-22, 04:11
feng lu 2013-02-22, 01:55
Public Network Services 2013-02-22, 04:09
Harsh J 2013-02-22, 06:15
Public Network Services 2013-02-22, 06:26