-Re: MapReduce processing with extra (possibly non-serializable) configuration
Public Network Services 2013-02-22, 06:25
Hazelcast is an interesting idea, but I was hoping that there is a way of
doing this in MapReduce. :-)
It didn't seem like that from the start, but I posted here just to make
sure I was not missing something.
So, I will serialize my data objects and use them accordingly.
On Thu, Feb 21, 2013 at 10:15 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> How do you imagine sending "data" of any kind (be it in object form,
> etc.) over the network to other nodes, without implementing or relying
> on a serialization for it? Are you looking for "easy" Java ways such
> as the distributed cache from Hazelcast, etc., where this may be taken
> care for you automatically in some way? :)
> On Fri, Feb 22, 2013 at 2:40 AM, Public Network Services
> <[EMAIL PROTECTED]> wrote:
> > Hi...
> > I am trying to put an existing file processing application into Hadoop
> > need to find the best way of propagating some extra configuration per
> > in the form of complex and proprietary custom Java objects.
> > The general idea is
> > A custom InputFormat splits the input data
> > The same InputFormat prepares the appropriate configuration for each
> > Hadoop processes each split in MapReduce, using the split itself and the
> > corresponding configuration
> > The problem is that these configuration objects contain a lot of
> > and references to other complex objects, and so on, therefore it will
> take a
> > lot of work to cover all the possible combinations and make the whole
> > serializable (if it can be done in the first place).
> > Most probably this is the only way forward, but if anyone has ever dealt
> > with this problem, please suggest the best approach to follow.
> > Thanks!
> Harsh J