I have a data processing logic implemented so that on input it receives
Iterable<Some>. I.e. pretty much the same as reducer's API. But I need to
use this code in Map, where each element is "arrived" as map() method
To solve the problem (at least for now), I'm doing the following:
* run processing code in a thread which I start in setup() and wait for
completion for it in cleanup()
* keep a buffer which I fill with map input items (and feed Iterable object
from this buffer until it has something)
* write to buffer until it is full and only then switch to a thread which
(assumption: processing logic always read data from buffer till the end, if
processing fails, then the whole job is marked as failed).
I don't see that it should cause any noticeable performance degradation:
switches between threads are quite rare. Also it looks like the approach is
safe. Could anyone please confirm that? Or in case there's a better
solution, please, let me know.
Btw, the rough cut of implementation you can find here (small class):
It is in working (unit-tests work well at least) state.
Thank you in advance!
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase