|
Amit Sela
2013-01-24, 12:13
Harsh J
2013-01-24, 15:12
Amit Sela
2013-01-24, 16:15
Amit Sela
2013-01-27, 11:43
Panshul Whisper
2013-01-27, 11:53
|
-
Submitting MapReduce job from remote server using JobClientAmit Sela 2013-01-24, 12:13
Hi all,
I want to run a MapReduce job using the Hadoop Java api from my analytics server. It is not the master or even a data node but it has the same Hadoop installation as all the nodes in the cluster. I tried using JobClient.runJob() but it accepts JobConf as argument and when using JobConf it is possible to set only mapred Mapper classes and I use mapreduce... I tried using JobControl and ControlledJob but it seems like it tries to run the job locally. the map phase just keeps attempting... Anyone tried it before ? I'm just looking for a way to submit MapReduce jobs from Java code and be able to monitor them. Thanks, Amit.
-
Re: Submitting MapReduce job from remote server using JobClientHarsh J 2013-01-24, 15:12
The Job class itself has a blocking and non-blocking submitter that is
similar to JobConf's runJob method you discovered. See http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit() and its following method waitForCompletion(). These seem to be what you're looking for. On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <[EMAIL PROTECTED]> wrote: > Hi all, > > I want to run a MapReduce job using the Hadoop Java api from my analytics > server. It is not the master or even a data node but it has the same Hadoop > installation as all the nodes in the cluster. > I tried using JobClient.runJob() but it accepts JobConf as argument and when > using JobConf it is possible to set only mapred Mapper classes and I use > mapreduce... > I tried using JobControl and ControlledJob but it seems like it tries to run > the job locally. the map phase just keeps attempting... > Anyone tried it before ? > I'm just looking for a way to submit MapReduce jobs from Java code and be > able to monitor them. > > Thanks, > > Amit. -- Harsh J
-
Re: Submitting MapReduce job from remote server using JobClientAmit Sela 2013-01-24, 16:15
Hi Harsh,
I'm using Job.waitForCompletion() method to run the job but I can't see it in the webapp and it doesn't seem to finish... I get: *org.apache.hadoop.mapred.JobClient - Running job: job_local_0001* *INFO org.apache.hadoop.util.ProcessTree - setsid exited with exit code 0* *2013-01-24 08:10:12.521 [Thread-51] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6* *2013-01-24 08:10:12.536 [Thread-51] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100* *2013-01-24 08:10:12.573 [Thread-51] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720* *2013-01-24 08:10:12.573 [Thread-51] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680* *2013-01-24 08:10:12.599 [Thread-51] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output* *2013-01-24 08:10:12.608 [Thread-51] INFO org.apache.hadoop.mapred.Task - Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting* *2013-01-24 08:10:13.348 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1] INFO org.apache.hadoop.mapred.JobClient - map 0% reduce 0%* *2013-01-24 08:10:15.509 [Thread-51] INFO org.apache.hadoop.mapred.LocalJobRunner - * *2013-01-24 08:10:15.510 [Thread-51] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local_0001_m_000000_0' done.* *2013-01-24 08:10:15.511 [Thread-51] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d* *2013-01-24 08:10:15.512 [Thread-51] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100* *2013-01-24 08:10:15.549 [Thread-51] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720* *2013-01-24 08:10:15.550 [Thread-51] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680* *2013-01-24 08:10:15.557 [Thread-51] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output* *2013-01-24 08:10:15.560 [Thread-51] INFO org.apache.hadoop.mapred.Task - Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting* *2013-01-24 08:10:16.358 [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1] INFO org.apache.hadoop.mapred.JobClient - map 100% reduce 0%* And after that, instead of going to Reduce phase I keep getting map attempts like: *INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100* *2013-01-24 08:10:21.563 [Thread-51] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720* *2013-01-24 08:10:21.563 [Thread-51] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680* *2013-01-24 08:10:21.570 [Thread-51] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output* *2013-01-24 08:10:21.573 [Thread-51] INFO org.apache.hadoop.mapred.Task - Task:attempt_local_0001_m_000003_0 is done. And is in the process of commiting* *2013-01-24 08:10:24.529 [Thread-51] INFO org.apache.hadoop.mapred.LocalJobRunner - * *2013-01-24 08:10:24.529 [Thread-51] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local_0001_m_000003_0' done.* *2013-01-24 08:10:24.530 [Thread-51] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99* * * Any clues ? Thanks for the help. On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <[EMAIL PROTECTED]> wrote:
-
Re: Submitting MapReduce job from remote server using JobClientAmit Sela 2013-01-27, 11:43
Yes I do.
I checked that by printing out Configuration.toString() and I see only the files I add as resources. Moreover, in my test environment, the test Analytics server is also a data node (or maybe that could cause more trouble ?). Anyway, I still get *org.apache.hadoop.mapred.JobClient - Running job: job_local_0001* * * And I don't know what's wrong here, I create a new Configuration(false) to avoid default settings. I set the resources manually (addResource). I validate it. Anything I'm forgetting ? On Thu, Jan 24, 2013 at 9:49 PM, <[EMAIL PROTECTED]> wrote: > ** > Hi Amit, > > Apart for the hadoop jars, Do you have the same config files > ($HADOOP_HOME/conf) that are in the cluster on your analytics server as > well? > > If you are having the default config files in analytics server then your > MR job would be running locally and not on the cluster. > Regards > Bejoy KS > > Sent from remote device, Please excuse typos > ------------------------------ > *From: * Amit Sela <[EMAIL PROTECTED]> > *Date: *Thu, 24 Jan 2013 18:15:49 +0200 > *To: *<[EMAIL PROTECTED]> > *ReplyTo: * [EMAIL PROTECTED] > *Subject: *Re: Submitting MapReduce job from remote server using JobClient > > Hi Harsh, > I'm using Job.waitForCompletion() method to run the job but I can't see it > in the webapp and it doesn't seem to finish... > I get: > *org.apache.hadoop.mapred.JobClient - Running > job: job_local_0001* > *INFO org.apache.hadoop.util.ProcessTree - > setsid exited with exit code 0* > *2013-01-24 08:10:12.521 [Thread-51] INFO > org.apache.hadoop.mapred.Task - Using > ResourceCalculatorPlugin : > org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6* > *2013-01-24 08:10:12.536 [Thread-51] INFO > org.apache.hadoop.mapred.MapTask - io.sort.mb > = 100* > *2013-01-24 08:10:12.573 [Thread-51] INFO > org.apache.hadoop.mapred.MapTask - data buffer > = 79691776/99614720* > *2013-01-24 08:10:12.573 [Thread-51] INFO > org.apache.hadoop.mapred.MapTask - record > buffer = 262144/327680* > *2013-01-24 08:10:12.599 [Thread-51] INFO > org.apache.hadoop.mapred.MapTask - Starting > flush of map output* > *2013-01-24 08:10:12.608 [Thread-51] INFO > org.apache.hadoop.mapred.Task - > Task:attempt_local_0001_m_000000_0 is done. And is in the process of > commiting* > *2013-01-24 08:10:13.348 > [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1] > INFO org.apache.hadoop.mapred.JobClient - map > 0% reduce 0%* > *2013-01-24 08:10:15.509 [Thread-51] INFO > org.apache.hadoop.mapred.LocalJobRunner - * > *2013-01-24 08:10:15.510 [Thread-51] INFO > org.apache.hadoop.mapred.Task - Task > 'attempt_local_0001_m_000000_0' done.* > *2013-01-24 08:10:15.511 [Thread-51] INFO > org.apache.hadoop.mapred.Task - Using > ResourceCalculatorPlugin : > org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d* > *2013-01-24 08:10:15.512 [Thread-51] INFO > org.apache.hadoop.mapred.MapTask - io.sort.mb > = 100* > *2013-01-24 08:10:15.549 [Thread-51] INFO > org.apache.hadoop.mapred.MapTask - data buffer > = 79691776/99614720* > *2013-01-24 08:10:15.550 [Thread-51] INFO > org.apache.hadoop.mapred.MapTask - record > buffer = 262144/327680* > *2013-01-24 08:10:15.557 [Thread-51] INFO > org.apache.hadoop.mapred.MapTask - Starting > flush of map output* > *2013-01-24 08:10
-
Re: Submitting MapReduce job from remote server using JobClientPanshul Whisper 2013-01-27, 11:53
Hello Amit,
I tried the same scenario, submitting map reduce jobs from a system that is outside the hadoop cluster and I used Sring Hadoop to do it. It worked wonderfully. Spring has made alot of things easier... you can try it. Here is a reference on how to do it: http://www.petrikainulainen.net/programming/apache-hadoop/creating-hadoop-mapreduce-job-with-spring-data-apache-hadoop/ hope this helps, Regards, On Sun, Jan 27, 2013 at 12:43 PM, Amit Sela <[EMAIL PROTECTED]> wrote: > Yes I do. > I checked that by printing out Configuration.toString() and I see only the > files I add as resources. > Moreover, in my test environment, the test Analytics server is also a data > node (or maybe that could cause more trouble ?). > Anyway, I still get > *org.apache.hadoop.mapred.JobClient - Running > job: job_local_0001* > * > * > And I don't know what's wrong here, I create a new Configuration(false) to > avoid default settings. I set the resources manually (addResource). I > validate it. Anything I'm forgetting ? > > > On Thu, Jan 24, 2013 at 9:49 PM, <[EMAIL PROTECTED]> wrote: > >> ** >> Hi Amit, >> >> Apart for the hadoop jars, Do you have the same config files >> ($HADOOP_HOME/conf) that are in the cluster on your analytics server as >> well? >> >> If you are having the default config files in analytics server then your >> MR job would be running locally and not on the cluster. >> Regards >> Bejoy KS >> >> Sent from remote device, Please excuse typos >> ------------------------------ >> *From: * Amit Sela <[EMAIL PROTECTED]> >> *Date: *Thu, 24 Jan 2013 18:15:49 +0200 >> *To: *<[EMAIL PROTECTED]> >> *ReplyTo: * [EMAIL PROTECTED] >> *Subject: *Re: Submitting MapReduce job from remote server using >> JobClient >> >> Hi Harsh, >> I'm using Job.waitForCompletion() method to run the job but I can't see >> it in the webapp and it doesn't seem to finish... >> I get: >> *org.apache.hadoop.mapred.JobClient - Running >> job: job_local_0001* >> *INFO org.apache.hadoop.util.ProcessTree - >> setsid exited with exit code 0* >> *2013-01-24 08:10:12.521 [Thread-51] INFO >> org.apache.hadoop.mapred.Task - Using >> ResourceCalculatorPlugin : >> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6* >> *2013-01-24 08:10:12.536 [Thread-51] INFO >> org.apache.hadoop.mapred.MapTask - io.sort.mb >> = 100* >> *2013-01-24 08:10:12.573 [Thread-51] INFO >> org.apache.hadoop.mapred.MapTask - data buffer >> = 79691776/99614720* >> *2013-01-24 08:10:12.573 [Thread-51] INFO >> org.apache.hadoop.mapred.MapTask - record >> buffer = 262144/327680* >> *2013-01-24 08:10:12.599 [Thread-51] INFO >> org.apache.hadoop.mapred.MapTask - Starting >> flush of map output* >> *2013-01-24 08:10:12.608 [Thread-51] INFO >> org.apache.hadoop.mapred.Task - >> Task:attempt_local_0001_m_000000_0 is done. And is in the process of >> commiting* >> *2013-01-24 08:10:13.348 >> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1] >> INFO org.apache.hadoop.mapred.JobClient - map >> 0% reduce 0%* >> *2013-01-24 08:10:15.509 [Thread-51] INFO >> org.apache.hadoop.mapred.LocalJobRunner - * >> *2013-01-24 08:10:15.510 [Thread-51] INFO >> org.apache.hadoop.mapred.Task - Task >> 'attempt_local_0001_m_000000_0' done.* >> *2013-01-24 08:10:15.511 [Thread-51] INFO >> org.apache.hadoop.mapred.Task - Using >> ResourceCalculatorPlugin : >> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d* >> *2013-01-24 08:10:15.512 [Thread-51] INFO Regards, Ouch Whisper 010101010101 |