Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Re: How to submit Tool jobs programatically in parallel?


+
Manoj Babu 2012-12-14, 05:57
+
David Parks 2012-12-14, 06:14
+
George Datskos 2012-12-14, 06:31
Copy link to this message
-
Re: How to submit Tool jobs programatically in parallel?
Can you show some sample code of submitting distcp job?

Cheers!
Manoj.

On Fri, Dec 14, 2012 at 11:44 AM, David Parks <[EMAIL PROTECTED]>wrote:

> Can I do that with s3distcp / distcp?  The job is being configured in the
> run() method of s3distcp (as it implements Tool).  So I think I can’t use
> this approach. I use this for the jobs I control of course, but the problem
> is things like distcp where I don’t control the configuration.****
>
> ** **
>
> Dave****
>
> ** **
>
> ** **
>
> *From:* Manoj Babu [mailto:[EMAIL PROTECTED]]
> *Sent:* Friday, December 14, 2012 12:57 PM
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: How to submit Tool jobs programatically in parallel?****
>
> ** **
>
> David,****
>
> ** **
>
> You try like below instead of runJob() you can try submitJob().****
>
> ** **
>
> JobClient jc = new JobClient(job);****
>
> jc.submitJob(job);****
>
> ** **
>
> ** **
>
>
> ****
>
> Cheers!****
>
> Manoj.****
>
>
>
> ****
>
> On Fri, Dec 14, 2012 at 10:09 AM, David Parks <[EMAIL PROTECTED]>
> wrote:****
>
> I'm submitting unrelated jobs programmatically (using AWS EMR) so they run
> in parallel.
>
> I'd like to run an s3distcp job in parallel as well, but the interface to
> that job is a Tool, e.g. ToolRunner.run(...).
>
> ToolRunner blocks until the job completes though, so presumably I'd need to
> create a thread pool to run these jobs in parallel.
>
> But creating multiple threads to submit concurrent jobs via ToolRunner,
> blocking on the jobs completion, just feels improper. Is there an
> alternative?****
>
> ** **
>