Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - Re: How to submit Tool jobs programatically in parallel?


Copy link to this message
-
RE: How to submit Tool jobs programatically in parallel?
David Parks 2012-12-14, 06:14
Can I do that with s3distcp / distcp?  The job is being configured in the
run() method of s3distcp (as it implements Tool).  So I think I can't use
this approach. I use this for the jobs I control of course, but the problem
is things like distcp where I don't control the configuration.

 

Dave

 

 

From: Manoj Babu [mailto:[EMAIL PROTECTED]]
Sent: Friday, December 14, 2012 12:57 PM
To: [EMAIL PROTECTED]
Subject: Re: How to submit Tool jobs programatically in parallel?

 

David,

 

You try like below instead of runJob() you can try submitJob().

 

JobClient jc = new JobClient(job);

jc.submitJob(job);

 

 
Cheers!

Manoj.

On Fri, Dec 14, 2012 at 10:09 AM, David Parks <[EMAIL PROTECTED]>
wrote:

I'm submitting unrelated jobs programmatically (using AWS EMR) so they run
in parallel.

I'd like to run an s3distcp job in parallel as well, but the interface to
that job is a Tool, e.g. ToolRunner.run(...).

ToolRunner blocks until the job completes though, so presumably I'd need to
create a thread pool to run these jobs in parallel.

But creating multiple threads to submit concurrent jobs via ToolRunner,
blocking on the jobs completion, just feels improper. Is there an
alternative?