Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Re: How to submit Tool jobs programatically in parallel?


+
Manoj Babu 2012-12-14, 05:57
+
David Parks 2012-12-14, 06:14
+
George Datskos 2012-12-14, 06:31
Copy link to this message
-
Re: How to submit Tool jobs programatically in parallel?
Can you show some sample code of submitting distcp job?

Cheers!
Manoj.

On Fri, Dec 14, 2012 at 11:44 AM, David Parks <[EMAIL PROTECTED]>wrote:

> Can I do that with s3distcp / distcp?  The job is being configured in the
> run() method of s3distcp (as it implements Tool).  So I think I can’t use
> this approach. I use this for the jobs I control of course, but the problem
> is things like distcp where I don’t control the configuration.****
>
> ** **
>
> Dave****
>
> ** **
>
> ** **
>
> *From:* Manoj Babu [mailto:[EMAIL PROTECTED]]
> *Sent:* Friday, December 14, 2012 12:57 PM
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: How to submit Tool jobs programatically in parallel?****
>
> ** **
>
> David,****
>
> ** **
>
> You try like below instead of runJob() you can try submitJob().****
>
> ** **
>
> JobClient jc = new JobClient(job);****
>
> jc.submitJob(job);****
>
> ** **
>
> ** **
>
>
> ****
>
> Cheers!****
>
> Manoj.****
>
>
>
> ****
>
> On Fri, Dec 14, 2012 at 10:09 AM, David Parks <[EMAIL PROTECTED]>
> wrote:****
>
> I'm submitting unrelated jobs programmatically (using AWS EMR) so they run
> in parallel.
>
> I'd like to run an s3distcp job in parallel as well, but the interface to
> that job is a Tool, e.g. ToolRunner.run(...).
>
> ToolRunner blocks until the job completes though, so presumably I'd need to
> create a thread pool to run these jobs in parallel.
>
> But creating multiple threads to submit concurrent jobs via ToolRunner,
> blocking on the jobs completion, just feels improper. Is there an
> alternative?****
>
> ** **
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB