Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Making input in Map iterable


Copy link to this message
-
Making input in Map iterable
Hello,

I have a data processing logic implemented so that on input it receives
Iterable<Some>. I.e. pretty much the same as reducer's API. But I need to
use this code in Map, where each element is "arrived" as map() method
invocation.
To solve the problem (at least for now), I'm doing the following:
* run processing code in a thread which I start in setup() and wait for
completion for it in cleanup()
* keep a buffer which I fill with map input items (and feed Iterable object
from this buffer until it has something)
* write to buffer until it is full and only then switch to a thread which
does processing.
(assumption: processing logic always read data from buffer till the end, if
processing fails, then the whole job is marked as failed).

I don't see that it should cause any noticeable performance degradation:
switches between threads are quite rare. Also it looks like the approach is
safe. Could anyone please confirm that? Or in case there's a better
solution, please, let me know.

Btw, the rough cut of implementation you can find here (small class):
https://github.com/sematext/HBaseHUT/blob/master/src/main/java/com/sematext/hbase/hut/UpdatesProcessingMrJob.java.
It is in working (unit-tests work well at least) state.

Thank you in advance!

Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB