Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Large static structures in M/R heap


+
Adam Phelps 2013-02-27, 18:42
+
Robert Evans 2013-02-27, 18:56
+
Adam Phelps 2013-02-27, 19:40
Copy link to this message
-
Re: Large static structures in M/R heap
David Rosenstrauch 2013-02-27, 18:53
On 02/27/2013 01:42 PM, Adam Phelps wrote:
> We have a job that uses a large lookup structure that gets created as a
> static class during the map setup phase (and we have the JVM reused so
> this only takes place once).  However of late this structure has grown
> drastically (due to items beyond our control) and we've seen a
> substantial increase in map time due to the lower available memory.
>
> Are there any easy solutions to this sort of problem?  My first thought
> was to see if it was possible to have all tasks for a job execute in
> parallel within the same JVM, but I'm not seeing any setting that would
> allow that.  Beyond that my only ideas are to move that data into an
> external one-per-node key-value store like memcached, but I'm worried
> the additional overhead of sending a query for each value being mapped
> would also kill the job performance.
>
> - Adam
>

We use a similar solution to what you suggested to address this issue.
Though, the in-memory app we run on each datanode is a proprietary one
which allows for pipelineing of queries, and obviously helps optimize this.

Still, even using off-the-shelf memcached, and incurring the overhead of
query-per-value, speed might work out to be more acceptable on this than
you think.  Maybe give it a test in the small to benchmark first.

HTH,

DR