|
|
-
Passing data via Configuration
Peter Cogan 2013-02-08, 15:15
Hi,
I have data stored in an object that I want to pass into my Mapper.
I see from Configuration that there are setters and getters for primitives, but is there a way of doing this with non-primitives - either my own classes or builtin classes (such as HashMap etc)
thanks! Peter
+
Peter Cogan 2013-02-08, 15:15
-
Re: Passing data via Configuration
Robert Evans 2013-02-08, 15:23
You could, but this is generally discouraged. Pig does something like this by taking the object serializing it out into a byte array and then using base64 encoding turns it into a string that is put in the config. The problem with this is that the config can grow very large. In the 1.0 line of Hadoop the maximum size of the Job's config is limited to avoid causing the Job Tracker to go out of memory. In V2 this is less of a concern because it is your own application master that has to read it all in.
In general if it is a very small amount of data you can play games like this, if it is a large amount of data you probably want to use the distributed cache to do this instead.
--Bobby
From: Peter Cogan <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Date: Friday, February 8, 2013 9:15 AM To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Subject: Passing data via Configuration
Hi,
I have data stored in an object that I want to pass into my Mapper.
I see from Configuration that there are setters and getters for primitives, but is there a way of doing this with non-primitives - either my own classes or builtin classes (such as HashMap etc)
thanks! Peter
+
Robert Evans 2013-02-08, 15:23
-
Re: Passing data via Configuration
Peter Cogan 2013-02-08, 19:51
Hi Rob,
thanks for the explanation - I had also thought about 'cheating' by serialising - I guess that's the way to go in my case as the data structure is really quite small.
thanks! On Fri, Feb 8, 2013 at 3:23 PM, Robert Evans <[EMAIL PROTECTED]> wrote:
> You could, but this is generally discouraged. Pig does something like > this by taking the object serializing it out into a byte array and then > using base64 encoding turns it into a string that is put in the config. > The problem with this is that the config can grow very large. In the 1.0 > line of Hadoop the maximum size of the Job's config is limited to avoid > causing the Job Tracker to go out of memory. In V2 this is less of a > concern because it is your own application master that has to read it all > in. > > In general if it is a very small amount of data you can play games like > this, if it is a large amount of data you probably want to use the > distributed cache to do this instead. > > --Bobby > > From: Peter Cogan <[EMAIL PROTECTED]> > Reply-To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Date: Friday, February 8, 2013 9:15 AM > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Subject: Passing data via Configuration > > Hi, > > I have data stored in an object that I want to pass into my Mapper. > > I see from Configuration that there are setters and getters for > primitives, but is there a way of doing this with non-primitives - either > my own classes or builtin classes (such as HashMap etc) > > thanks! > Peter >
+
Peter Cogan 2013-02-08, 19:51
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext