HDFS, mail # user - Re: How to submit Tool jobs programatically in parallel?

Manoj Babu 2012-12-14, 05:57
RE: How to submit Tool jobs programatically in parallel?
David Parks 2012-12-14, 06:14
Can I do that with s3distcp / distcp?  The job is being configured in the
run() method of s3distcp (as it implements Tool).  So I think I can't use
this approach. I use this for the jobs I control of course, but the problem
is things like distcp where I don't control the configuration.





You try like below instead of runJob() you can try submitJob().


JobClient jc = new JobClient(job);





On Fri, Dec 14, 2012 at 10:09 AM, David Parks <[EMAIL PROTECTED]>

I'm submitting unrelated jobs programmatically (using AWS EMR) so they run
in parallel.

I'd like to run an s3distcp job in parallel as well, but the interface to
that job is a Tool, e.g. ToolRunner.run(...).

ToolRunner blocks until the job completes though, so presumably I'd need to
create a thread pool to run these jobs in parallel.

But creating multiple threads to submit concurrent jobs via ToolRunner,
blocking on the jobs completion, just feels improper. Is there an


George Datskos 2012-12-14, 06:31
Manoj Babu 2012-12-14, 06:27