Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> How do I synchronize Hadoop jobs?


Copy link to this message
-
Re: How do I synchronize Hadoop jobs?
Hi McNeil
             Have a look at OOZIE. It is meant for work flow management in hadoop and can serve your purpose.

------Original Message------
From: W.P. McNeill
To: Hadoop Mailing List
ReplyTo: [EMAIL PROTECTED]
Subject: How do I synchronize Hadoop jobs?
Sent: Feb 16, 2012 00:53

Say I have two Hadoop jobs, A and B, that can be run in parallel. I have
another job, C, that takes the output of both A and B as input. I want to
run A and B at the same time, wait until both have finished, and then
launch C. What is the best way to do this?

I know the answer if I've got a single Java client program that launches A,
B, and C. But what if I don't have the option to launch all of them from a
single Java program? (Say I've got a much more complicated system with many
steps happening between A-B and C.) How do I synchronize between jobs, make
sure there's no race conditions etc. Is this what Zookeeper is for?

Regards
Bejoy K S

From handheld, Please excuse typos.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB