Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> How do I synchronize Hadoop jobs?

Copy link to this message
Re: How do I synchronize Hadoop jobs?
Hi McNeil
             Have a look at OOZIE. It is meant for work flow management in hadoop and can serve your purpose.

------Original Message------
From: W.P. McNeill
To: Hadoop Mailing List
Subject: How do I synchronize Hadoop jobs?
Sent: Feb 16, 2012 00:53

Say I have two Hadoop jobs, A and B, that can be run in parallel. I have
another job, C, that takes the output of both A and B as input. I want to
run A and B at the same time, wait until both have finished, and then
launch C. What is the best way to do this?

I know the answer if I've got a single Java client program that launches A,
B, and C. But what if I don't have the option to launch all of them from a
single Java program? (Say I've got a much more complicated system with many
steps happening between A-B and C.) How do I synchronize between jobs, make
sure there's no race conditions etc. Is this what Zookeeper is for?

Bejoy K S

From handheld, Please excuse typos.