|
Cornish, Duane C.
2012-11-02, 20:53
John Vines
2012-11-02, 21:04
Cornish, Duane C.
2012-11-02, 21:21
William Slacum
2012-11-03, 00:48
David Medinets
2012-11-03, 03:49
Cornish, Duane C.
2012-11-05, 13:56
John Vines
2012-11-05, 14:13
Billie Rinaldi
2012-11-05, 14:40
Cornish, Duane C.
2012-11-05, 14:46
Billie Rinaldi
2012-11-05, 15:03
David Medinets
2012-11-05, 15:16
Cornish, Duane C.
2012-11-05, 16:54
Krishmin Rai
2012-11-05, 17:14
Billie Rinaldi
2012-11-05, 17:18
Cornish, Duane C.
2012-11-06, 13:45
David Medinets
2012-11-06, 14:34
Cornish, Duane C.
2012-11-06, 14:53
Billie Rinaldi
2012-11-06, 15:19
|
-
Accumulo Map Reduce is not distributedCornish, Duane C. 2012-11-02, 20:53
Hello,
I apologize if this discuss should be directed to a hadoop map reduce forum, however, I have some concern that my problem may be with my use of accumulo. I have a map reduce job that I want to run over data in a table. I have an index table and a support table which contains a subset of the data in the index table. I would like to map reduce over the support table on my small 4 node cluster. I have written a map reduce job that uses the AccumuloRowInputFormat class and sets the support table as its input table. In my mapper, I read in a row of the support table, and make a call to a static function which pulls information out of the index table. Next, I use the data pulled back from the function call as input to a call to an external .so file that is stored on the name node. I then make another static function call to ingest the new data back into the index table. (I know I could emit this in the reduce step, but what I'm ingesting is formatted in a somewhat complex java object and I already had a static function that ingested it the way I needed it.) My reduce step is completely empty. I output print statements from my mapper to see my progress. The problem that I'm getting is that my entire job appears to run in sequence not in parallel. I am running it from the accumulo master on the 4 node system. I realized that my support table is very small and was not being split across any tables. I am now presplitting this table across all 4 nodes. Now, when I run the map reduce job it appears that 4 separate map reduce jobs run one after each other. The first map reduce job runs, gets to 100%, then the next map reduce job runs, etc. The job is only called once, why are there 4 jobs running? Why won't these jobs run in parallel? Is there any way to set the number of tasks that can run? This is possible from the hadoop command line, is it possible from the java API? Also, could my problem stem from the fact that during my mapper I am making static function calls to another class in my java project, accessing my accumulo index table, or making a call to an exteral .so library? I could restructure the job to avoid making static function calls and I could write directly to the Accumulo table from my map reduce job if that would fix my problem. I can't avoid making the external .so library call. Any help would be greatly appreciated. Thanks, Duane
-
Re: Accumulo Map Reduce is not distributedJohn Vines 2012-11-02, 21:04
This sounds like an issue with how your MR environment is configured and/or
how you're kicking off your mapreduce. Accumulo's input formats with automatically set the number of mappers to the number of tablets you have, so you should have seen your job go from 1 mapper to 4. What you describe is you now do 4 MR jobs instead of just one, is that correct? Because that doesn't make a lot of sense, unless by presplitting your table you meant you now have 4 different support tables. Or do you mean that you're only running one mapper at a time in an MR job that has 4 mappers total? I believe it's somewhere in your kickoff that things may be a bit misconstrued. Just so I'm clear, how many mapper slots do you have per node, is your job a chain MR job, and do you mind sharing your code which sets up and kicks off your MR job so I have an idea of what could be kicking off 4 jobs. John On Fri, Nov 2, 2012 at 4:53 PM, Cornish, Duane C. <[EMAIL PROTECTED]>wrote: > Hello,**** > > ** ** > > I apologize if this discuss should be directed to a hadoop map reduce > forum, however, I have some concern that my problem may be with my use of > accumulo. **** > > ** ** > > I have a map reduce job that I want to run over data in a table. I have > an index table and a support table which contains a subset of the data in > the index table. I would like to map reduce over the support table on my > small 4 node cluster. **** > > ** ** > > I have written a map reduce job that uses the AccumuloRowInputFormat > class and sets the support table as its input table.**** > > ** ** > > In my mapper, I read in a row of the support table, and make a call to a > static function which pulls information out of the index table. Next, I > use the data pulled back from the function call as input to a call to an > external .so file that is stored on the name node. I then make another > static function call to ingest the new data back into the index table. (I > know I could emit this in the reduce step, but what I’m ingesting is > formatted in a somewhat complex java object and I already had a static > function that ingested it the way I needed it.) My reduce step is > completely empty.**** > > ** ** > > I output print statements from my mapper to see my progress. The problem > that I’m getting is that my entire job appears to run in sequence not in > parallel. I am running it from the accumulo master on the 4 node system. > **** > > ** ** > > I realized that my support table is very small and was not being split > across any tables. I am now presplitting this table across all 4 nodes. > Now, when I run the map reduce job it appears that 4 separate map reduce > jobs run one after each other. The first map reduce job runs, gets to > 100%, then the next map reduce job runs, etc. The job is only called once, > why are there 4 jobs running? Why won’t these jobs run in parallel?**** > > ** ** > > Is there any way to set the number of tasks that can run? This is > possible from the hadoop command line, is it possible from the java API? > Also, could my problem stem from the fact that during my mapper I am making > static function calls to another class in my java project, accessing my > accumulo index table, or making a call to an exteral .so library? I could > restructure the job to avoid making static function calls and I could write > directly to the Accumulo table from my map reduce job if that would fix my > problem. I can’t avoid making the external .so library call. Any help > would be greatly appreciated. **** > > ** ** > > Thanks,**** > > Duane**** >
-
RE: Accumulo Map Reduce is not distributedCornish, Duane C. 2012-11-02, 21:21
Thanks for the prompt response John!
When I say that I'm pre-splitting my table, I mean I am using the tableOperations().addSplits(table,splits) command. I have verified that this is correctly splitting my table into 4 tablets and it is being distributed across my cloud before I start my map reduce job. Now, I only kick off the job once, but it appears that 4 separate jobs run (one after the other). The first one reaches 100% in its map phase (and based on my output only handled ¼ of the data), then the next job starts at 0% and reaches 100%, and so on. So I think I'm "only running one mapper at a time in an MR job that has 4 mappers total.". I have 2 mapper slots per node. My hadoop is set up so that one machine is the namenode and the other 3 are datanodes. This gives me 6 slots total. (This is not congruent to my accumulo where the master is also a slave - giving 4 total slaves). My map reduce job is not a chain job, so all 4 tablets should be able to run at the same time. Here is my job class code below: import org.apache.accumulo.core.security.Authorizations; import org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat; import org.apache.accumulo.core.client.mapreduce.AccumuloRowInputFormat; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.io.DoubleWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.util.Tool; import org.apache.log4j.Level; public class Accumulo_FE_MR_Job extends Configured implements Tool{ private void runOneTable() throws Exception { System.out.println("Running Map Reduce Feature Extraction Job"); Job job = new Job(getConf(), getClass().getName()); job.setJarByClass(getClass()); job.setJobName("MRFE"); job.setInputFormatClass(AccumuloRowInputFormat.class); AccumuloRowInputFormat.setZooKeeperInstance(job.getConfiguration(), HMaxConstants.INSTANCE, HMaxConstants.ZOO_SERVERS); AccumuloRowInputFormat.setInputInfo(job.getConfiguration(), HMaxConstants.USER, HMaxConstants.PASSWORD.getBytes(), HMaxConstants.FEATLESS_IMG_TABLE, new Authorizations()); AccumuloRowInputFormat.setLogLevel(job.getConfiguration(), Level.FATAL); job.setMapperClass(AccumuloFEMapper.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(DoubleWritable.class); job.setNumReduceTasks(4); job.setReducerClass(AccumuloFEReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); job.setOutputFormatClass(AccumuloOutputFormat.class); AccumuloOutputFormat.setZooKeeperInstance(job.getConfiguration(), HMaxConstants.INSTANCE, HMaxConstants.ZOO_SERVERS); AccumuloOutputFormat.setOutputInfo(job.getConfiguration(), HMaxConstants.USER, HMaxConstants.PASSWORD.getBytes(), true, HMaxConstants.ALL_IMG_TABLE); AccumuloOutputFormat.setLogLevel(job.getConfiguration(), Level.FATAL); job.waitForCompletion(true); if (job.isSuccessful()) { System.err.println("Job Successful"); } else { System.err.println("Job Unsuccessful"); } } @Override public int run(String[] arg0) throws Exception { runOneTable(); return 0; } } Thanks, Duane From: John Vines [mailto:[EMAIL PROTECTED]] Sent: Friday, November 02, 2012 5:04 PM To: [EMAIL PROTECTED] Subject: Re: Accumulo Map Reduce is not distributed This sounds like an issue with how your MR environment is configured and/or how you're kicking off your mapreduce. Accumulo's input formats with automatically set the number of mappers to the number of tablets you have, so you should have seen your job go from 1 mapper to 4. What you describe is you now do 4 MR jobs instead of just one, is that correct? Because that doesn't make a lot of sense, unless by presplitting your table you meant you now have 4 different support tables. Or do you mean that you're only running one mapper at a time in an MR job that has 4 mappers total? I believe it's somewhere in your kickoff that things may be a bit misconstrued. Just so I'm clear, how many mapper slots do you have per node, is your job a chain MR job, and do you mind sharing your code which sets up and kicks off your MR job so I have an idea of what could be kicking off 4 jobs. John On Fri, Nov 2, 2012 at 4:53 PM, Cornish, Duane C. <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Hello, I apologize if this discuss should be directed to a hadoop map reduce forum, however, I have some concern that my problem may be with my use of accumulo. I have a map reduce job that I want to run over data in a table. I have an index table and a support table which contains a subset of the data in the index table. I would like to map reduce over the support table on my small 4 node cluster. I have written a map reduce job that uses the AccumuloRowInputFormat class and sets the support table as its input table. In my mapper, I read in a row of the support table, and make a call to a static function which pulls information out of the index table. Next, I use the data pulled back from the function call as input to a call to an external .so file that is stored on the name node. I then make another static function call to ingest the new data back into the index table. (I know I could emit this in the reduce step, but what I'm ingesting is formatted in a somewhat complex java object and I already had a static function that ingested it the way I needed it.) My reduce step is completely empty. I output print statements from my mapper to see my progress. The problem that I'm getting is that my entire job ap
-
Re: Accumulo Map Reduce is not distributedWilliam Slacum 2012-11-03, 00:48
What about the main method that calls ToolRunner.run? If you have 4 jobs
being created, then you're calling run(String[]) or runOneTable() 4 times. On Fri, Nov 2, 2012 at 5:21 PM, Cornish, Duane C. <[EMAIL PROTECTED]>wrote: > Thanks for the prompt response John!**** > > **** > > When I say that I’m pre-splitting my table, I mean I am using the > tableOperations().addSplits(table,splits) command. I have verified that > this is correctly splitting my table into 4 tablets and it is being > distributed across my cloud before I start my map reduce job.**** > > ** ** > > Now, I only kick off the job once, but it appears that 4 separate jobs run > (one after the other). The first one reaches 100% in its map phase (and > based on my output only handled ¼ of the data), then the next job starts at > 0% and reaches 100%, and so on. So I think I’m “only running one mapper > at a time in an MR job that has 4 mappers total.”. I have 2 mapper slots > per node. My hadoop is set up so that one machine is the namenode and the > other 3 are datanodes. This gives me 6 slots total. (This is not > congruent to my accumulo where the master is also a slave – giving 4 total > slaves). **** > > ** ** > > My map reduce job is not a chain job, so all 4 tablets should be able to > run at the same time.**** > > ** ** > > Here is my job class code below:**** > > ** ** > > *import* org.apache.accumulo.core.security.Authorizations;**** > > *import* org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat;** > ** > > *import* org.apache.accumulo.core.client.mapreduce.AccumuloRowInputFormat; > **** > > *import* org.apache.hadoop.conf.Configured;**** > > *import* org.apache.hadoop.io.DoubleWritable;**** > > *import* org.apache.hadoop.io.Text;**** > > *import* org.apache.hadoop.mapreduce.Job;**** > > *import* org.apache.hadoop.util.Tool;**** > > *import* org.apache.log4j.Level;**** > > ** ** > > ** ** > > *public* *class* Accumulo_FE_MR_Job *extends* Configured *implements*Tool{ > **** > > **** > > *private* *void* runOneTable() *throws* Exception {**** > > System.*out*.println("Running Map Reduce Feature Extraction Job"); > **** > > ** ** > > Job job = *new* Job(getConf(), getClass().getName());**** > > ** ** > > job.setJarByClass(getClass());**** > > job.setJobName("MRFE");**** > > ** ** > > job.setInputFormatClass(AccumuloRowInputFormat.*class*);**** > > AccumuloRowInputFormat.*setZooKeeperInstance* > (job.getConfiguration(),**** > > HMaxConstants.*INSTANCE*,**** > > HMaxConstants.*ZOO_SERVERS*);**** > > ** ** > > AccumuloRowInputFormat.*setInputInfo*(job.getConfiguration(),**** > > HMaxConstants.*USER*, **** > > HMaxConstants.*PASSWORD*.getBytes(), **** > > HMaxConstants.*FEATLESS_IMG_TABLE*,**** > > *new* Authorizations());**** > > **** > > AccumuloRowInputFormat.*setLogLevel*(job.getConfiguration(), > Level.*FATAL*);**** > > ** ** > > job.setMapperClass(AccumuloFEMapper.*class*);**** > > job.setMapOutputKeyClass(Text.*class*);**** > > job.setMapOutputValueClass(DoubleWritable.*class*);**** > > ** ** > > job.setNumReduceTasks(4);**** > > job.setReducerClass(AccumuloFEReducer.*class*);**** > > job.setOutputKeyClass(Text.*class*);**** > > job.setOutputValueClass(Text.*class*);**** > > ** ** > > job.setOutputFormatClass(AccumuloOutputFormat.*class*);**** > > AccumuloOutputFormat.*setZooKeeperInstance* > (job.getConfiguration(),**** > > HMaxConstants.*INSTANCE*,**** > > HMaxConstants.*ZOO_SERVERS*);**** > > AccumuloOutputFormat.*setOutputInfo*(job.getConfiguration(),**** > > HMaxConstants.*USER*,**** > > HMaxConstants.*PASSWORD*.getBytes(),**** > > *true*,**** > > HMaxConstants.*ALL_IMG_TABLE*);****
-
Re: Accumulo Map Reduce is not distributedDavid Medinets 2012-11-03, 03:49
Duane, when you say 4 jobs ... did you mean 4 mappers or 4 Hadoop jobs
that appear on the Hadoop Job Tracker page with separate Job Ids?
-
RE: Accumulo Map Reduce is not distributedCornish, Duane C. 2012-11-05, 13:56
Hi William,
Thanks for helping me out and sorry I didn't get back to you sooner, I was away for the weekend. I am only callying ToolRunner.run once. public static void ExtractFeaturesFromNewImages() throws Exception{ String[] parameters = new String[1]; parameters[0] = "foo"; InitializeFeatureExtractor(); ToolRunner.run(CachedConfiguration.getInstance(), new Accumulo_FE_MR_Job(), parameters); } Another indicator that I'm only calling it once is that before I was pre-splitting the table, I was just getting one larger map-reduce job with only 1 mapper. Based on my print statements, the job was running in sequence (which I guess makes sense because the table only existed on one node in my cluster. Then after pre-splitting my table, I was getting one job that had 4 mappers. Each was running one after the other. I hadn't changed any code (other than adding in the splits). So, I'm only calling ToolRunner.run once. Furthermore, my run function in my job class is provided below: @Override public int run(String[] arg0) throws Exception { runOneTable(); return 0; } Thanks, Duane From: William Slacum [mailto:[EMAIL PROTECTED]] Sent: Friday, November 02, 2012 8:48 PM To: [EMAIL PROTECTED] Subject: Re: Accumulo Map Reduce is not distributed What about the main method that calls ToolRunner.run? If you have 4 jobs being created, then you're calling run(String[]) or runOneTable() 4 times. On Fri, Nov 2, 2012 at 5:21 PM, Cornish, Duane C. <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Thanks for the prompt response John! When I say that I'm pre-splitting my table, I mean I am using the tableOperations().addSplits(table,splits) command. I have verified that this is correctly splitting my table into 4 tablets and it is being distributed across my cloud before I start my map reduce job. Now, I only kick off the job once, but it appears that 4 separate jobs run (one after the other). The first one reaches 100% in its map phase (and based on my output only handled ¼ of the data), then the next job starts at 0% and reaches 100%, and so on. So I think I'm "only running one mapper at a time in an MR job that has 4 mappers total.". I have 2 mapper slots per node. My hadoop is set up so that one machine is the namenode and the other 3 are datanodes. This gives me 6 slots total. (This is not congruent to my accumulo where the master is also a slave - giving 4 total slaves). My map reduce job is not a chain job, so all 4 tablets should be able to run at the same time. Here is my job class code below: import org.apache.accumulo.core.security.Authorizations; import org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat; import org.apache.accumulo.core.client.mapreduce.AccumuloRowInputFormat; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.io.DoubleWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.util.Tool; import org.apache.log4j.Level; public class Accumulo_FE_MR_Job extends Configured implements Tool{ private void runOneTable() throws Exception { System.out.println("Running Map Reduce Feature Extraction Job"); Job job = new Job(getConf(), getClass().getName()); job.setJarByClass(getClass()); job.setJobName("MRFE"); job.setInputFormatClass(AccumuloRowInputFormat.class); AccumuloRowInputFormat.setZooKeeperInstance(job.getConfiguration(), HMaxConstants.INSTANCE, HMaxConstants.ZOO_SERVERS); AccumuloRowInputFormat.setInputInfo(job.getConfiguration(), HMaxConstants.USER, HMaxConstants.PASSWORD.getBytes(), HMaxConstants.FEATLESS_IMG_TABLE, new Authorizations()); AccumuloRowInputFormat.setLogLevel(job.getConfiguration(), Level.FATAL); job.setMapperClass(AccumuloFEMapper.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(DoubleWritable.class); job.setNumReduceTasks(4); job.setReducerClass(AccumuloFEReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); job.setOutputFormatClass(AccumuloOutputFormat.class); AccumuloOutputFormat.setZooKeeperInstance(job.getConfiguration(), HMaxConstants.INSTANCE, HMaxConstants.ZOO_SERVERS); AccumuloOutputFormat.setOutputInfo(job.getConfiguration(), HMaxConstants.USER, HMaxConstants.PASSWORD.getBytes(), true, HMaxConstants.ALL_IMG_TABLE); AccumuloOutputFormat.setLogLevel(job.getConfiguration(), Level.FATAL); job.waitForCompletion(true); if (job.isSuccessful()) { System.err.println("Job Successful"); } else { System.err.println("Job Unsuccessful"); } } @Override public int run(String[] arg0) throws Exception { runOneTable(); return 0; } } Thanks, Duane From: John Vines [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>] Sent: Friday, November 02, 2012 5:04 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: Re: Accumulo Map Reduce is not distributed This sounds like an issue with how your MR environment is configured and/or how you're kicking off your mapreduce. Accumulo's input formats with automatically set the number of mappers to the number of tablets you have, so you should have seen your job go from 1 mapper to 4. What you describe is you now do 4 MR jobs instead of just one, is that correct? Because that doesn't make a lot of sense, unless by presplitting your table you meant you now have 4 different support tables. Or do you mean that you're only running one ma
-
RE: Accumulo Map Reduce is not distributedJohn Vines 2012-11-05, 14:13
So it sounds like the job was correctly set to 4 mappers and your issue is
in your MapReduce configuration. I would check the jobtracker page and verify the number of map slots, as well as how they're running, as print statements are not the most accurate in the framework. Sent from my phone, pardon the typos and brevity. On Nov 5, 2012 8:59 AM, "Cornish, Duane C." <[EMAIL PROTECTED]> wrote: > Hi William,**** > > ** ** > > Thanks for helping me out and sorry I didn’t get back to you sooner, I was > away for the weekend. I am only callying ToolRunner.run once.**** > > ** ** > > *public* *static* *void* ExtractFeaturesFromNewImages() *throws*Exception{ > **** > > String[] parameters = *new* String[1];**** > > parameters[0] = "foo";**** > > *InitializeFeatureExtractor*();**** > > ToolRunner.*run*(CachedConfiguration.*getInstance*(), *new*Accumulo_FE_MR_Job(), parameters); > **** > > }**** > > ** ** > > Another indicator that I’m only calling it once is that before I was > pre-splitting the table, I was just getting one larger map-reduce job with > only 1 mapper. Based on my print statements, the job was running in > sequence (which I guess makes sense because the table only existed on one > node in my cluster. Then after pre-splitting my table, I was getting one > job that had 4 mappers. Each was running one after the other. I hadn’t > changed any code (other than adding in the splits). So, I’m only calling > ToolRunner.run once. Furthermore, my run function in my job class is > provided below:**** > > ** ** > > @Override**** > > *public* *int* run(String[] arg0) *throws* Exception { **** > > runOneTable();**** > > *return* 0;**** > > }**** > > ** ** > > Thanks,**** > > Duane**** > > *From:* William Slacum [mailto:[EMAIL PROTECTED]] > *Sent:* Friday, November 02, 2012 8:48 PM > *To:* [EMAIL PROTECTED] > *Subject:* Re: Accumulo Map Reduce is not distributed**** > > ** ** > > What about the main method that calls ToolRunner.run? If you have 4 jobs > being created, then you're calling run(String[]) or runOneTable() 4 times. > **** > > On Fri, Nov 2, 2012 at 5:21 PM, Cornish, Duane C. < > [EMAIL PROTECTED]> wrote:**** > > Thanks for the prompt response John!**** > > When I say that I’m pre-splitting my table, I mean I am using the > tableOperations().addSplits(table,splits) command. I have verified that > this is correctly splitting my table into 4 tablets and it is being > distributed across my cloud before I start my map reduce job.**** > > **** > > Now, I only kick off the job once, but it appears that 4 separate jobs run > (one after the other). The first one reaches 100% in its map phase (and > based on my output only handled ¼ of the data), then the next job starts at > 0% and reaches 100%, and so on. So I think I’m “only running one mapper > at a time in an MR job that has 4 mappers total.”. I have 2 mapper slots > per node. My hadoop is set up so that one machine is the namenode and the > other 3 are datanodes. This gives me 6 slots total. (This is not > congruent to my accumulo where the master is also a slave – giving 4 total > slaves). **** > > **** > > My map reduce job is not a chain job, so all 4 tablets should be able to > run at the same time.**** > > **** > > Here is my job class code below:**** > > **** > > *import* org.apache.accumulo.core.security.Authorizations;**** > > *import* org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat;** > ** > > *import* org.apache.accumulo.core.client.mapreduce.AccumuloRowInputFormat; > **** > > *import* org.apache.hadoop.conf.Configured;**** > > *import* org.apache.hadoop.io.DoubleWritable;**** > > *import* org.apache.hadoop.io.Text;**** > > *import* org.apache.hadoop.mapreduce.Job;**** > > *import* org.apache.hadoop.util.Tool;**** > > *import* org.apache.log4j.Level;**** > > **** > > **** > > *public* *class* Accumulo_FE_MR_Job *extends* Configured *implements*Tool{
-
Re: Accumulo Map Reduce is not distributedBillie Rinaldi 2012-11-05, 14:40
On Mon, Nov 5, 2012 at 6:13 AM, John Vines <[EMAIL PROTECTED]> wrote:
> So it sounds like the job was correctly set to 4 mappers and your issue is > in your MapReduce configuration. I would check the jobtracker page and > verify the number of map slots, as well as how they're running, as print > statements are not the most accurate in the framework. > Also make sure your MR job isn't running in local mode. Sometimes that happens if your job can't find the Hadoop configuration directory. Billie > Sent from my phone, pardon the typos and brevity. > On Nov 5, 2012 8:59 AM, "Cornish, Duane C." <[EMAIL PROTECTED]> > wrote: > >> Hi William,**** >> >> ** ** >> >> Thanks for helping me out and sorry I didn’t get back to you sooner, I >> was away for the weekend. I am only callying ToolRunner.run once.**** >> >> ** ** >> >> *public* *static* *void* ExtractFeaturesFromNewImages() *throws*Exception{ >> **** >> >> String[] parameters = *new* String[1];**** >> >> parameters[0] = "foo";**** >> >> *InitializeFeatureExtractor*();**** >> >> ToolRunner.*run*(CachedConfiguration.*getInstance*(), *new*Accumulo_FE_MR_Job(), parameters); >> **** >> >> }**** >> >> ** ** >> >> Another indicator that I’m only calling it once is that before I was >> pre-splitting the table, I was just getting one larger map-reduce job with >> only 1 mapper. Based on my print statements, the job was running in >> sequence (which I guess makes sense because the table only existed on one >> node in my cluster. Then after pre-splitting my table, I was getting one >> job that had 4 mappers. Each was running one after the other. I hadn’t >> changed any code (other than adding in the splits). So, I’m only calling >> ToolRunner.run once. Furthermore, my run function in my job class is >> provided below:**** >> >> ** ** >> >> @Override**** >> >> *public* *int* run(String[] arg0) *throws* Exception { **** >> >> runOneTable();**** >> >> *return* 0;**** >> >> }**** >> >> ** ** >> >> Thanks,**** >> >> Duane**** >> >> *From:* William Slacum [mailto:[EMAIL PROTECTED]] >> *Sent:* Friday, November 02, 2012 8:48 PM >> *To:* [EMAIL PROTECTED] >> *Subject:* Re: Accumulo Map Reduce is not distributed**** >> >> ** ** >> >> What about the main method that calls ToolRunner.run? If you have 4 jobs >> being created, then you're calling run(String[]) or runOneTable() 4 times. >> **** >> >> On Fri, Nov 2, 2012 at 5:21 PM, Cornish, Duane C. < >> [EMAIL PROTECTED]> wrote:**** >> >> Thanks for the prompt response John!**** >> >> When I say that I’m pre-splitting my table, I mean I am using the >> tableOperations().addSplits(table,splits) command. I have verified that >> this is correctly splitting my table into 4 tablets and it is being >> distributed across my cloud before I start my map reduce job.**** >> >> **** >> >> Now, I only kick off the job once, but it appears that 4 separate jobs >> run (one after the other). The first one reaches 100% in its map phase >> (and based on my output only handled ¼ of the data), then the next job >> starts at 0% and reaches 100%, and so on. So I think I’m “only running >> one mapper at a time in an MR job that has 4 mappers total.”. I have 2 >> mapper slots per node. My hadoop is set up so that one machine is the >> namenode and the other 3 are datanodes. This gives me 6 slots total. >> (This is not congruent to my accumulo where the master is also a slave – >> giving 4 total slaves). **** >> >> **** >> >> My map reduce job is not a chain job, so all 4 tablets should be able to >> run at the same time.**** >> >> **** >> >> Here is my job class code below:**** >> >> **** >> >> *import* org.apache.accumulo.core.security.Authorizations;**** >> >> *import* org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat;* >> *** >> >> *import* org.apache.accumulo.core.client.mapreduce.AccumuloRowInputFormat
-
RE: Accumulo Map Reduce is not distributedCornish, Duane C. 2012-11-05, 14:46
Billie,
I think I just started to come to that same conclusion (I'm relatively new to cloud computing). It appears that it is running in local mode. My console output says "mapred.LocalJobRunner" and the job never appears on my Hadoop Job page. How do I fix this problem? I also found that the "JobTracker" link on my Accumulo Overview page points to http://0.0.0.0:50030/ instead of the actual computer name. Duane From: Billie Rinaldi [mailto:[EMAIL PROTECTED]] Sent: Monday, November 05, 2012 9:41 AM To: [EMAIL PROTECTED] Subject: Re: Accumulo Map Reduce is not distributed On Mon, Nov 5, 2012 at 6:13 AM, John Vines <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: So it sounds like the job was correctly set to 4 mappers and your issue is in your MapReduce configuration. I would check the jobtracker page and verify the number of map slots, as well as how they're running, as print statements are not the most accurate in the framework. Also make sure your MR job isn't running in local mode. Sometimes that happens if your job can't find the Hadoop configuration directory. Billie Sent from my phone, pardon the typos and brevity. On Nov 5, 2012 8:59 AM, "Cornish, Duane C." <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Hi William, Thanks for helping me out and sorry I didn't get back to you sooner, I was away for the weekend. I am only callying ToolRunner.run once. public static void ExtractFeaturesFromNewImages() throws Exception{ String[] parameters = new String[1]; parameters[0] = "foo"; InitializeFeatureExtractor(); ToolRunner.run(CachedConfiguration.getInstance(), new Accumulo_FE_MR_Job(), parameters); } Another indicator that I'm only calling it once is that before I was pre-splitting the table, I was just getting one larger map-reduce job with only 1 mapper. Based on my print statements, the job was running in sequence (which I guess makes sense because the table only existed on one node in my cluster. Then after pre-splitting my table, I was getting one job that had 4 mappers. Each was running one after the other. I hadn't changed any code (other than adding in the splits). So, I'm only calling ToolRunner.run once. Furthermore, my run function in my job class is provided below: @Override public int run(String[] arg0) throws Exception { runOneTable(); return 0; } Thanks, Duane From: William Slacum [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>] Sent: Friday, November 02, 2012 8:48 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: Re: Accumulo Map Reduce is not distributed What about the main method that calls ToolRunner.run? If you have 4 jobs being created, then you're calling run(String[]) or runOneTable() 4 times. On Fri, Nov 2, 2012 at 5:21 PM, Cornish, Duane C. <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Thanks for the prompt response John! When I say that I'm pre-splitting my table, I mean I am using the tableOperations().addSplits(table,splits) command. I have verified that this is correctly splitting my table into 4 tablets and it is being distributed across my cloud before I start my map reduce job. Now, I only kick off the job once, but it appears that 4 separate jobs run (one after the other). The first one reaches 100% in its map phase (and based on my output only handled ¼ of the data), then the next job starts at 0% and reaches 100%, and so on. So I think I'm "only running one mapper at a time in an MR job that has 4 mappers total.". I have 2 mapper slots per node. My hadoop is set up so that one machine is the namenode and the other 3 are datanodes. This gives me 6 slots total. (This is not congruent to my accumulo where the master is also a slave - giving 4 total slaves). My map reduce job is not a chain job, so all 4 tablets should be able to run at the same time. Here is my job class code below: import org.apache.accumulo.core.security.Authorizations; import org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat; import org.apache.accumulo.core.client.mapreduce.AccumuloRowInputFormat; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.io.DoubleWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.util.Tool; import org.apache.log4j.Level; public class Accumulo_FE_MR_Job extends Configured implements Tool{ private void runOneTable() throws Exception { System.out.println("Running Map Reduce Feature Extraction Job"); Job job = new Job(getConf(), getClass().getName()); job.setJarByClass(getClass()); job.setJobName("MRFE"); job.setInputFormatClass(AccumuloRowInputFormat.class); AccumuloRowInputFormat.setZooKeeperInstance(job.getConfiguration(), HMaxConstants.INSTANCE, HMaxConstants.ZOO_SERVERS); AccumuloRowInputFormat.setInputInfo(job.getConfiguration(), HMaxConstants.USER, HMaxConstants.PASSWORD.getBytes(), HMaxConstants.FEATLESS_IMG_TABLE, new Authorizations()); AccumuloRowInputFormat.setLogLevel(job.getConfiguration(), Level.FATAL); job.setMapperClass(AccumuloFEMapper.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(DoubleWritable.class); job.setNumReduceTasks(4); job.setReducerClass(AccumuloFEReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); job.setOutputFormatClass(AccumuloOutputFormat.class); AccumuloOutputFormat.setZooKeeperInstance(job.getConfiguration(), HMaxConstants.INSTANCE, HMaxConstants.ZOO_SERVERS); AccumuloOutputFormat.setOutputInfo(job.getConfiguration(),
-
Re: Accumulo Map Reduce is not distributedBillie Rinaldi 2012-11-05, 15:03
On Mon, Nov 5, 2012 at 6:46 AM, Cornish, Duane C.
<[EMAIL PROTECTED]>wrote: > Billie,**** > > ** ** > > I think I just started to come to that same conclusion (I’m relatively new > to cloud computing). It appears that it is running in local mode. My > console output says “mapred.LocalJobRunner” and the job never appears on my > Hadoop Job page. How do I fix this problem? I also found that the > “JobTracker” link on my Accumulo Overview page points to > http://0.0.0.0:50030/ instead of the actual computer name. > First check your accumulo-env.sh in the Accumulo conf directory. For the lines that look like the following, change the "/path/to/X" locations to the actual Java, Hadoop, and Zookeeper directories. test -z "$JAVA_HOME" && export JAVA_HOME=/path/to/java test -z "$HADOOP_HOME" && export HADOOP_HOME=/path/to/hadoop test -z "$ZOOKEEPER_HOME" && export ZOOKEEPER_HOME=/path/to/zookeeper You may also need to make sure that the command you use to run the MR job has JAVA_HOME, HADOOP_HOME, ZOOKEEPER_HOME, and ACCUMULO_HOME environment variables, which can be done by using export commands like the ones above. Billie > **** > > ** ** > > Duane**** > > ** ** > > *From:* Billie Rinaldi [mailto:[EMAIL PROTECTED]] > *Sent:* Monday, November 05, 2012 9:41 AM > > *To:* [EMAIL PROTECTED] > *Subject:* Re: Accumulo Map Reduce is not distributed**** > > ** ** > > On Mon, Nov 5, 2012 at 6:13 AM, John Vines <[EMAIL PROTECTED]> wrote:**** > > So it sounds like the job was correctly set to 4 mappers and your issue is > in your MapReduce configuration. I would check the jobtracker page and > verify the number of map slots, as well as how they're running, as print > statements are not the most accurate in the framework.**** > > > Also make sure your MR job isn't running in local mode. Sometimes that > happens if your job can't find the Hadoop configuration directory. > > Billie > > **** > > Sent from my phone, pardon the typos and brevity.**** > > On Nov 5, 2012 8:59 AM, "Cornish, Duane C." <[EMAIL PROTECTED]> > wrote:**** > > Hi William,**** > > **** > > Thanks for helping me out and sorry I didn’t get back to you sooner, I was > away for the weekend. I am only callying ToolRunner.run once.**** > > **** > > *public* *static* *void* ExtractFeaturesFromNewImages() *throws*Exception{ > **** > > String[] parameters = *new* String[1];**** > > parameters[0] = "foo";**** > > *InitializeFeatureExtractor*();**** > > ToolRunner.*run*(CachedConfiguration.*getInstance*(), *new*Accumulo_FE_MR_Job(), parameters); > **** > > }**** > > **** > > Another indicator that I’m only calling it once is that before I was > pre-splitting the table, I was just getting one larger map-reduce job with > only 1 mapper. Based on my print statements, the job was running in > sequence (which I guess makes sense because the table only existed on one > node in my cluster. Then after pre-splitting my table, I was getting one > job that had 4 mappers. Each was running one after the other. I hadn’t > changed any code (other than adding in the splits). So, I’m only calling > ToolRunner.run once. Furthermore, my run function in my job class is > provided below:**** > > **** > > @Override**** > > *public* *int* run(String[] arg0) *throws* Exception { **** > > runOneTable();**** > > *return* 0;**** > > }**** > > **** > > Thanks,**** > > Duane**** > > *From:* William Slacum [mailto:[EMAIL PROTECTED]] > *Sent:* Friday, November 02, 2012 8:48 PM > *To:* [EMAIL PROTECTED] > *Subject:* Re: Accumulo Map Reduce is not distributed**** > > **** > > What about the main method that calls ToolRunner.run? If you have 4 jobs > being created, then you're calling run(String[]) or runOneTable() 4 times. > **** > > On Fri, Nov 2, 2012 at 5:21 PM, Cornish, Duane C. < > [EMAIL PROTECTED]> wrote:****
-
Re: Accumulo Map Reduce is not distributedDavid Medinets 2012-11-05, 15:16
I occasionally forget to have the core-site.xml file in my classpath.
The default hadoop behavior is to read the local filesystem.
-
RE: Accumulo Map Reduce is not distributedCornish, Duane C. 2012-11-05, 16:54
Billie,
Thanks for the advice. I have had those variables set correctly in accumulo-env.sh. I've been using this cloud for a couple months with no problems (I was not running map reduce jobs on it though). I also just checked and re-exported those environment variables right before I run my Accumulo MR job. I tried outputting the environment variables from within my job class and they resolve correctly. Does it matter that I am using Accumulo version 1.4.1 and hadoop 1.0.3? I know that Accumulo 1.4.1 was tested with hadoop 0.20.2. Any further guidance would be greatly appreciated. Duane From: Billie Rinaldi [mailto:[EMAIL PROTECTED]] Sent: Monday, November 05, 2012 10:04 AM To: [EMAIL PROTECTED] Subject: Re: Accumulo Map Reduce is not distributed On Mon, Nov 5, 2012 at 6:46 AM, Cornish, Duane C. <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Billie, I think I just started to come to that same conclusion (I'm relatively new to cloud computing). It appears that it is running in local mode. My console output says "mapred.LocalJobRunner" and the job never appears on my Hadoop Job page. How do I fix this problem? I also found that the "JobTracker" link on my Accumulo Overview page points to http://0.0.0.0:50030/ instead of the actual computer name. First check your accumulo-env.sh in the Accumulo conf directory. For the lines that look like the following, change the "/path/to/X" locations to the actual Java, Hadoop, and Zookeeper directories. test -z "$JAVA_HOME" && export JAVA_HOME=/path/to/java test -z "$HADOOP_HOME" && export HADOOP_HOME=/path/to/hadoop test -z "$ZOOKEEPER_HOME" && export ZOOKEEPER_HOME=/path/to/zookeeper You may also need to make sure that the command you use to run the MR job has JAVA_HOME, HADOOP_HOME, ZOOKEEPER_HOME, and ACCUMULO_HOME environment variables, which can be done by using export commands like the ones above. Billie Duane From: Billie Rinaldi [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>] Sent: Monday, November 05, 2012 9:41 AM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: Re: Accumulo Map Reduce is not distributed On Mon, Nov 5, 2012 at 6:13 AM, John Vines <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: So it sounds like the job was correctly set to 4 mappers and your issue is in your MapReduce configuration. I would check the jobtracker page and verify the number of map slots, as well as how they're running, as print statements are not the most accurate in the framework. Also make sure your MR job isn't running in local mode. Sometimes that happens if your job can't find the Hadoop configuration directory. Billie Sent from my phone, pardon the typos and brevity. On Nov 5, 2012 8:59 AM, "Cornish, Duane C." <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Hi William, Thanks for helping me out and sorry I didn't get back to you sooner, I was away for the weekend. I am only callying ToolRunner.run once. public static void ExtractFeaturesFromNewImages() throws Exception{ String[] parameters = new String[1]; parameters[0] = "foo"; InitializeFeatureExtractor(); ToolRunner.run(CachedConfiguration.getInstance(), new Accumulo_FE_MR_Job(), parameters); } Another indicator that I'm only calling it once is that before I was pre-splitting the table, I was just getting one larger map-reduce job with only 1 mapper. Based on my print statements, the job was running in sequence (which I guess makes sense because the table only existed on one node in my cluster. Then after pre-splitting my table, I was getting one job that had 4 mappers. Each was running one after the other. I hadn't changed any code (other than adding in the splits). So, I'm only calling ToolRunner.run once. Furthermore, my run function in my job class is provided below: @Override public int run(String[] arg0) throws Exception { runOneTable(); return 0; } Thanks, Duane From: William Slacum [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>] Sent: Friday, November 02, 2012 8:48 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: Re: Accumulo Map Reduce is not distributed What about the main method that calls ToolRunner.run? If you have 4 jobs being created, then you're calling run(String[]) or runOneTable() 4 times. On Fri, Nov 2, 2012 at 5:21 PM, Cornish, Duane C. <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Thanks for the prompt response John! When I say that I'm pre-splitting my table, I mean I am using the tableOperations().addSplits(table,splits) command. I have verified that this is correctly splitting my table into 4 tablets and it is being distributed across my cloud before I start my map reduce job. Now, I only kick off the job once, but it appears that 4 separate jobs run (one after the other). The first one reaches 100% in its map phase (and based on my output only handled ¼ of the data), then the next job starts at 0% and reaches 100%, and so on. So I think I'm "only running one mapper at a time in an MR job that has 4 mappers total.". I have 2 mapper slots per node. My hadoop is set up so that one machine is the namenode and the other 3 are datanodes. This gives me 6 slots total. (This is not congruent to my accumulo where the master is also a slave - giving 4 total slaves). My map reduce job is not a chain job, so all 4 tablets should be able to run at the same time. Here is my job class code below: import org.apache.accumulo.core.security.Authorizations; import org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat; import org.apache.accumulo.core.client.mapreduce.AccumuloRowInputFormat; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.io.DoubleWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.J
-
Re: Accumulo Map Reduce is not distributedKrishmin Rai 2012-11-05, 17:14
Duane,
I've run into a similar issue before: jobs were always being run locally and not being submitted to the job tracker. The fix in our case was to make sure that we explicitly added the mapred-site.xml file to the configuration object before creating the job. Something like: conf.addResource(new Path(<path_to_mapred-site.xml>)); -Krishmin On Nov 5, 2012, at 11:54 AM, Cornish, Duane C. wrote: > Billie, > > Thanks for the advice. I have had those variables set correctly in accumulo-env.sh. I’ve been using this cloud for a couple months with no problems (I was not running map reduce jobs on it though). I also just checked and re-exported those environment variables right before I run my Accumulo MR job. I tried outputting the environment variables from within my job class and they resolve correctly. > > Does it matter that I am using Accumulo version 1.4.1 and hadoop 1.0.3? I know that Accumulo 1.4.1 was tested with hadoop 0.20.2. > > Any further guidance would be greatly appreciated. > > Duane > > From: Billie Rinaldi [mailto:[EMAIL PROTECTED]] > Sent: Monday, November 05, 2012 10:04 AM > To: [EMAIL PROTECTED] > Subject: Re: Accumulo Map Reduce is not distributed > > On Mon, Nov 5, 2012 at 6:46 AM, Cornish, Duane C. <[EMAIL PROTECTED]> wrote: > Billie, > > I think I just started to come to that same conclusion (I’m relatively new to cloud computing). It appears that it is running in local mode. My console output says “mapred.LocalJobRunner” and the job never appears on my Hadoop Job page. How do I fix this problem? I also found that the “JobTracker” link on my Accumulo Overview page points to http://0.0.0.0:50030/ instead of the actual computer name. > > First check your accumulo-env.sh in the Accumulo conf directory. For the lines that look like the following, change the "/path/to/X" locations to the actual Java, Hadoop, and Zookeeper directories. > > test -z "$JAVA_HOME" && export JAVA_HOME=/path/to/java > test -z "$HADOOP_HOME" && export HADOOP_HOME=/path/to/hadoop > test -z "$ZOOKEEPER_HOME" && export ZOOKEEPER_HOME=/path/to/zookeeper > > You may also need to make sure that the command you use to run the MR job has JAVA_HOME, HADOOP_HOME, ZOOKEEPER_HOME, and ACCUMULO_HOME environment variables, which can be done by using export commands like the ones above. > > Billie > > > > Duane > > From: Billie Rinaldi [mailto:[EMAIL PROTECTED]] > Sent: Monday, November 05, 2012 9:41 AM > > To: [EMAIL PROTECTED] > Subject: Re: Accumulo Map Reduce is not distributed > > On Mon, Nov 5, 2012 at 6:13 AM, John Vines <[EMAIL PROTECTED]> wrote: > So it sounds like the job was correctly set to 4 mappers and your issue is in your MapReduce configuration. I would check the jobtracker page and verify the number of map slots, as well as how they're running, as print statements are not the most accurate in the framework. > > > Also make sure your MR job isn't running in local mode. Sometimes that happens if your job can't find the Hadoop configuration directory. > > Billie > > > Sent from my phone, pardon the typos and brevity. > > On Nov 5, 2012 8:59 AM, "Cornish, Duane C." <[EMAIL PROTECTED]> wrote: > Hi William, > > Thanks for helping me out and sorry I didn’t get back to you sooner, I was away for the weekend. I am only callying ToolRunner.run once. > > public static void ExtractFeaturesFromNewImages() throws Exception{ > String[] parameters = new String[1]; > parameters[0] = "foo"; > InitializeFeatureExtractor(); > ToolRunner.run(CachedConfiguration.getInstance(), new Accumulo_FE_MR_Job(), parameters); > } > > Another indicator that I’m only calling it once is that before I was pre-splitting the table, I was just getting one larger map-reduce job with only 1 mapper. Based on my print statements, the job was running in sequence (which I guess makes sense because the table only existed on one node in my cluster. Then after pre-splitting my table, I was getting one job that had 4 mappers. Each was running one after the other. I hadn’t changed any code (other than adding in the splits). So, I’m only calling ToolRunner.run once. Furthermore, my run function in my job class is provided below:
-
Re: Accumulo Map Reduce is not distributedBillie Rinaldi 2012-11-05, 17:18
On Mon, Nov 5, 2012 at 8:54 AM, Cornish, Duane C.
<[EMAIL PROTECTED]>wrote: > Billie,**** > > ** ** > > Thanks for the advice. I have had those variables set correctly in > accumulo-env.sh. I’ve been using this cloud for a couple months with no > problems (I was not running map reduce jobs on it though). I also just > checked and re-exported those environment variables right before I run my > Accumulo MR job. I tried outputting the environment variables from within > my job class and they resolve correctly. **** > > ** ** > > Does it matter that I am using Accumulo version 1.4.1 and hadoop 1.0.3? I > know that Accumulo 1.4.1 was tested with hadoop 0.20.2. **** > > ** ** > > Any further guidance would be greatly appreciated. > Hadoop 1.0.3 should be fine. It's likely to be what David Medinets suggested. To get the Hadoop conf on your classpath, try something like the following (assuming you're running your job with "hadoop jar"): export HADOOP_CLASSPATH=$HADOOP_HOME/conf:$HADOOP_CLASSPATH Billie > **** > > ** ** > > Duane**** > > ** ** > > *From:* Billie Rinaldi [mailto:[EMAIL PROTECTED]] > *Sent:* Monday, November 05, 2012 10:04 AM > > *To:* [EMAIL PROTECTED] > *Subject:* Re: Accumulo Map Reduce is not distributed**** > > ** ** > > On Mon, Nov 5, 2012 at 6:46 AM, Cornish, Duane C. < > [EMAIL PROTECTED]> wrote:**** > > Billie,**** > > **** > > I think I just started to come to that same conclusion (I’m relatively new > to cloud computing). It appears that it is running in local mode. My > console output says “mapred.LocalJobRunner” and the job never appears on my > Hadoop Job page. How do I fix this problem? I also found that the > “JobTracker” link on my Accumulo Overview page points to > http://0.0.0.0:50030/ instead of the actual computer name. **** > > > First check your accumulo-env.sh in the Accumulo conf directory. For the > lines that look like the following, change the "/path/to/X" locations to > the actual Java, Hadoop, and Zookeeper directories. > > test -z "$JAVA_HOME" && export JAVA_HOME=/path/to/java > test -z "$HADOOP_HOME" && export HADOOP_HOME=/path/to/hadoop > test -z "$ZOOKEEPER_HOME" && export > ZOOKEEPER_HOME=/path/to/zookeeper > > You may also need to make sure that the command you use to run the MR job > has JAVA_HOME, HADOOP_HOME, ZOOKEEPER_HOME, and ACCUMULO_HOME environment > variables, which can be done by using export commands like the ones above. > > Billie > > **** > > **** > > Duane**** > > **** > > *From:* Billie Rinaldi [mailto:[EMAIL PROTECTED]] > *Sent:* Monday, November 05, 2012 9:41 AM**** > > > *To:* [EMAIL PROTECTED] > *Subject:* Re: Accumulo Map Reduce is not distributed**** > > **** > > On Mon, Nov 5, 2012 at 6:13 AM, John Vines <[EMAIL PROTECTED]> wrote:**** > > So it sounds like the job was correctly set to 4 mappers and your issue is > in your MapReduce configuration. I would check the jobtracker page and > verify the number of map slots, as well as how they're running, as print > statements are not the most accurate in the framework.**** > > > Also make sure your MR job isn't running in local mode. Sometimes that > happens if your job can't find the Hadoop configuration directory. > > Billie > > **** > > Sent from my phone, pardon the typos and brevity.**** > > On Nov 5, 2012 8:59 AM, "Cornish, Duane C." <[EMAIL PROTECTED]> > wrote:**** > > Hi William,**** > > **** > > Thanks for helping me out and sorry I didn’t get back to you sooner, I was > away for the weekend. I am only callying ToolRunner.run once.**** > > **** > > *public* *static* *void* ExtractFeaturesFromNewImages() *throws*Exception{ > **** > > String[] parameters = *new* String[1];**** > > parameters[0] = "foo";**** > > *InitializeFeatureExtractor*();**** > > ToolRunner.*run*(CachedConfiguration.*getInstance*(), *new*Accumulo_FE_MR_Job(), parameters); > **** > > }****
-
RE: Accumulo Map Reduce is not distributedCornish, Duane C. 2012-11-06, 13:45
Thanks for all of the help on this. Your comments led me down the right path. I'll explain what I did to fix it for reference purposes in the email archive. My map reduce job was running locally because it did not have the hadoop configuration. I was attempting to kick off my map reduce job from within a larger program that I was running via the "java -jar" command. I think if I had kicked off the job with the "hadoop jar" command it would have worked. To set the correct configuration in my job, I set my configuration manually with the following lines:
Configuration conf = getConf(); conf.addResource("path_to_mapred-site.xml"); conf.addResource("path_to_core-site.xml"); conf.addResource("path_to_hdfs-site.xml"); //mapred.job.tracker as defined in mapred-site.xml conf.set("mapred.job.tracker", <value from mapred.job.tracker>); //fs.default.name as defined in core-site.xml conf.set("fs.default.name", <value from core-site.xml>); Before hand, my job was not showing up in the task tracker. Now it shows up correctly and completes successfully. Thanks again! Duane From: Billie Rinaldi [mailto:[EMAIL PROTECTED]] Sent: Monday, November 05, 2012 12:18 PM To: [EMAIL PROTECTED] Subject: Re: Accumulo Map Reduce is not distributed On Mon, Nov 5, 2012 at 8:54 AM, Cornish, Duane C. <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Billie, Thanks for the advice. I have had those variables set correctly in accumulo-env.sh. I've been using this cloud for a couple months with no problems (I was not running map reduce jobs on it though). I also just checked and re-exported those environment variables right before I run my Accumulo MR job. I tried outputting the environment variables from within my job class and they resolve correctly. Does it matter that I am using Accumulo version 1.4.1 and hadoop 1.0.3? I know that Accumulo 1.4.1 was tested with hadoop 0.20.2. Any further guidance would be greatly appreciated. Hadoop 1.0.3 should be fine. It's likely to be what David Medinets suggested. To get the Hadoop conf on your classpath, try something like the following (assuming you're running your job with "hadoop jar"): export HADOOP_CLASSPATH=$HADOOP_HOME/conf:$HADOOP_CLASSPATH Billie Duane From: Billie Rinaldi [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>] Sent: Monday, November 05, 2012 10:04 AM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: Re: Accumulo Map Reduce is not distributed On Mon, Nov 5, 2012 at 6:46 AM, Cornish, Duane C. <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Billie, I think I just started to come to that same conclusion (I'm relatively new to cloud computing). It appears that it is running in local mode. My console output says "mapred.LocalJobRunner" and the job never appears on my Hadoop Job page. How do I fix this problem? I also found that the "JobTracker" link on my Accumulo Overview page points to http://0.0.0.0:50030/ instead of the actual computer name. First check your accumulo-env.sh in the Accumulo conf directory. For the lines that look like the following, change the "/path/to/X" locations to the actual Java, Hadoop, and Zookeeper directories. test -z "$JAVA_HOME" && export JAVA_HOME=/path/to/java test -z "$HADOOP_HOME" && export HADOOP_HOME=/path/to/hadoop test -z "$ZOOKEEPER_HOME" && export ZOOKEEPER_HOME=/path/to/zookeeper You may also need to make sure that the command you use to run the MR job has JAVA_HOME, HADOOP_HOME, ZOOKEEPER_HOME, and ACCUMULO_HOME environment variables, which can be done by using export commands like the ones above. Billie Duane From: Billie Rinaldi [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>] Sent: Monday, November 05, 2012 9:41 AM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: Re: Accumulo Map Reduce is not distributed On Mon, Nov 5, 2012 at 6:13 AM, John Vines <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: So it sounds like the job was correctly set to 4 mappers and your issue is in your MapReduce configuration. I would check the jobtracker page and verify the number of map slots, as well as how they're running, as print statements are not the most accurate in the framework. Also make sure your MR job isn't running in local mode. Sometimes that happens if your job can't find the Hadoop configuration directory. Billie Sent from my phone, pardon the typos and brevity. On Nov 5, 2012 8:59 AM, "Cornish, Duane C." <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Hi William, Thanks for helping me out and sorry I didn't get back to you sooner, I was away for the weekend. I am only callying ToolRunner.run once. public static void ExtractFeaturesFromNewImages() throws Exception{ String[] parameters = new String[1]; parameters[0] = "foo"; InitializeFeatureExtractor(); ToolRunner.run(CachedConfiguration.getInstance(), new Accumulo_FE_MR_Job(), parameters); } Another indicator that I'm only calling it once is that before I was pre-splitting the table, I was just getting one larger map-reduce job with only 1 mapper. Based on my print statements, the job was running in sequence (which I guess makes sense because the table only existed on one node in my cluster. Then after pre-splitting my table, I was getting one job that had 4 mappers. Each was running one after the other. I hadn't changed any code (other than adding in the splits). So, I'm only calling ToolRunner.run once. Furthermore, my run function in my job class is provided below: @Override public int run(String[] arg0) throws Exception { runOneTable(); return 0; } Thanks, Duane From: William Slacum [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>] Sent: Friday, November 02, 2012 8:48 PM
-
Re: Accumulo Map Reduce is not distributedDavid Medinets 2012-11-06, 14:34
I'm glad that you got your job working my hard-coding your
configuration files as resources. However, setting your classpath so the files are automatically found would make your software more flexible.
-
RE: Accumulo Map Reduce is not distributedCornish, Duane C. 2012-11-06, 14:53
David,
Thanks for pursuing this (I know you brought it up yesterday and Billie had endorsed it). I just tried it and it works as well. I agree, it does make the software more flexible. Yesterday, I didn't pursue it as I wasn't positive how to do it initially. Then Billie offered the actual export line but mentioned that it was assuming I was running my job via the "hadoop jar" command instead of "java -jar". Anyhow, thanks for all the support! I'm relatively new to mapreduce. -----Original Message----- From: David Medinets [mailto:[EMAIL PROTECTED]] Sent: Tuesday, November 06, 2012 9:34 AM To: [EMAIL PROTECTED] Subject: Re: Accumulo Map Reduce is not distributed I'm glad that you got your job working my hard-coding your configuration files as resources. However, setting your classpath so the files are automatically found would make your software more flexible.
-
Re: Accumulo Map Reduce is not distributedBillie Rinaldi 2012-11-06, 15:19
On Nov 6, 2012, at 9:53 AM, "Cornish, Duane C." <[EMAIL PROTECTED] wrote:
> David, > > Thanks for pursuing this (I know you brought it up yesterday and > Billie had endorsed it). I just tried it and it works as well. I > agree, it does make the software more flexible. Yesterday, I didn't > pursue it as I wasn't positive how to do it initially. Then Billie > offered the actual export line but mentioned that it was assuming I > was running my job via the "hadoop jar" command instead of "java - > jar". Duane, Yes, sorry I didn't tell you how to do it when running with java. I'll record it here for posterity: you should just be able to add the hadoop conf dir to the java classpath, i.e. java -cp $HADOOP_HOME/conf etc. Glad you got it working! Billie > > Anyhow, thanks for all the support! I'm relatively new to mapreduce. > > -----Original Message----- > From: David Medinets [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, November 06, 2012 9:34 AM > To: [EMAIL PROTECTED] > Subject: Re: Accumulo Map Reduce is not distributed > > I'm glad that you got your job working my hard-coding your > configuration files as resources. However, setting your classpath so > the files are automatically found would make your software more > flexible. |