Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Passing data via Configuration


Copy link to this message
-
Re: Passing data via Configuration
You could, but this is generally discouraged.  Pig does something like this by taking the object serializing it out into a byte array and then using base64 encoding turns it into a string that is put in the config.  The problem with this is that the config can grow very large.  In the 1.0 line of Hadoop the maximum size of the Job's config is limited to avoid causing the Job Tracker to go out of memory.  In V2 this is less of a concern because it is your own application master that has to read it all in.

In general if it is a very small amount of data you can play games like this, if it is a large amount of data you probably want to use the distributed cache to do this instead.

--Bobby

From: Peter Cogan <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Friday, February 8, 2013 9:15 AM
To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: Passing data via Configuration

Hi,

I have data stored in an object that I want to pass into my Mapper.

I see from Configuration that there are setters and getters for primitives, but is there a way of doing this with non-primitives - either my own classes or builtin classes (such as HashMap etc)

thanks!
Peter