Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Java jars and MapReduce


Copy link to this message
-
Java jars and MapReduce
Hello,

Current Design: I have a java object MyObjectA. MyObjectA goes through
Three processors (jars) that are run in sequence and do a lot of processing
to beef up A with tons of additional stuff (think ETL) and the final result
is MyObjectD (note: MyObjectD is really A with more fields if you will
added to it but I wanted to clarify here that they are very different).
MyObjectD when ready is saved to my non relational database (accumulo).
Currently, all this is done by making use of Quartz Scheduler - a
List<MyObjectA> is submitted for processing every N mintues. Everything is
written in Java and there is a lot of talking back n forth with Accumulo
(to access tables that will help convert A to D).

We split the processing into three processors just because it was more
convenient and we didn't want everything rolled up in one processor. Having
said that I can definitely merge the three into ONE processor. But my
question is, what are all the things (obviously generically speaking) I
need to be concerned about/ look into to make this a map reduce job? I am
asking for pointers on where to even start here.

Lets say, all my processing is done in mappers. So my input will be
MyObjectA and my output will be MyObjectD from each mapper. And then my
reducer simple writes all MyObjectD objects to accumulo. Is achieving this
as easy as just submitting the jar to hadoop ????

I guess overall, I want to know how does one go about blindly submitting a
.jar (java apps) and make this a map reduce task.
We are going this route, because multi-threading won't solve our problem.
We have to process objects in batch now and they are coming in every
minute.

Thank you in advance for any and all help.
+
Steve Lewis 2013-03-02, 04:21
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB