|
|
-
Single JVM, many tasks: How do I know when I'm on the last map taskSaptarshi Guha 2013-02-22, 20:17
Hello,
In my Java Hadoop job, i have reset the reuse variable to be -1. hence a JVM will process multiple tasks. I have also seen to it that instead of writing to the job context, the keys and values are accumulated in a hashtable. When the bytes written to this table reach BUFSIZE (e..g 150MB) i call my reducer(or what some call combiner) (inside the map task). However if BUFSIZE is never accumulated my reducer is never called. So i have to flush it. Now I could flush this in the map classes 'cleanup' method. In that case, the data would be rewritten to the same hashtable. But at one point this hashtable must be written to the job context onto the Hadoop Reduce stage. The way i see it, if i intend to share this hashtable across map tasks (within the same JVM), i need to know when the JVM has reached it's final map task. When that is complete, then i know i *must* flush this to the job context. Hopefully i've been some what clear. Does Hadoop 0.20.2 have an API that tells the child JVM if it's on the last map task? Cheers Saptarshi |