We have a job that uses a large lookup structure that gets created as a
static class during the map setup phase (and we have the JVM reused so
this only takes place once). However of late this structure has grown
drastically (due to items beyond our control) and we've seen a
substantial increase in map time due to the lower available memory.
Are there any easy solutions to this sort of problem? My first thought
was to see if it was possible to have all tasks for a job execute in
parallel within the same JVM, but I'm not seeing any setting that would
allow that. Beyond that my only ideas are to move that data into an
external one-per-node key-value store like memcached, but I'm worried
the additional overhead of sending a query for each value being mapped
would also kill the job performance.