Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Making input in Map iterable


Copy link to this message
-
Making input in Map iterable
Alex Baranau 2010-12-08, 21:06
Hello,

I have a data processing logic implemented so that on input it receives
Iterable<Some>. I.e. pretty much the same as reducer's API. But I need to
use this code in Map, where each element is "arrived" as map() method
invocation.
To solve the problem (at least for now), I'm doing the following:
* run processing code in a thread which I start in setup() and wait for
completion for it in cleanup()
* keep a buffer which I fill with map input items (and feed Iterable object
from this buffer until it has something)
* write to buffer until it is full and only then switch to a thread which
does processing.
(assumption: processing logic always read data from buffer till the end, if
processing fails, then the whole job is marked as failed).

I don't see that it should cause any noticeable performance degradation:
switches between threads are quite rare. Also it looks like the approach is
safe. Could anyone please confirm that? Or in case there's a better
solution, please, let me know.

Btw, the rough cut of implementation you can find here (small class):
https://github.com/sematext/HBaseHUT/blob/master/src/main/java/com/sematext/hbase/hut/UpdatesProcessingMrJob.java.
It is in working (unit-tests work well at least) state.

Thank you in advance!

Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase