|
|
-
Re: Large static structures in M/R heapDavid Rosenstrauch 2013-02-27, 18:53
On 02/27/2013 01:42 PM, Adam Phelps wrote:
> We have a job that uses a large lookup structure that gets created as a > static class during the map setup phase (and we have the JVM reused so > this only takes place once). However of late this structure has grown > drastically (due to items beyond our control) and we've seen a > substantial increase in map time due to the lower available memory. > > Are there any easy solutions to this sort of problem? My first thought > was to see if it was possible to have all tasks for a job execute in > parallel within the same JVM, but I'm not seeing any setting that would > allow that. Beyond that my only ideas are to move that data into an > external one-per-node key-value store like memcached, but I'm worried > the additional overhead of sending a query for each value being mapped > would also kill the job performance. > > - Adam > We use a similar solution to what you suggested to address this issue. Though, the in-memory app we run on each datanode is a proprietary one which allows for pipelineing of queries, and obviously helps optimize this. Still, even using off-the-shelf memcached, and incurring the overhead of query-per-value, speed might work out to be more acceptable on this than you think. Maybe give it a test in the small to benchmark first. HTH, DR |