Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> Accumulo Map Reduce is not distributed


+
Cornish, Duane C. 2012-11-02, 20:53
+
John Vines 2012-11-02, 21:04
+
Cornish, Duane C. 2012-11-02, 21:21
+
William Slacum 2012-11-03, 00:48
+
David Medinets 2012-11-03, 03:49
+
Cornish, Duane C. 2012-11-05, 13:56
+
John Vines 2012-11-05, 14:13
+
Billie Rinaldi 2012-11-05, 14:40
+
Cornish, Duane C. 2012-11-05, 14:46
+
Billie Rinaldi 2012-11-05, 15:03
+
Cornish, Duane C. 2012-11-05, 16:54
+
Krishmin Rai 2012-11-05, 17:14
+
Billie Rinaldi 2012-11-05, 17:18
Copy link to this message
-
RE: Accumulo Map Reduce is not distributed
Thanks for all of the help on this.  Your comments led me down the right path.  I'll explain what I did to fix it for reference purposes in the email archive.  My map reduce job was running locally because it did not have the hadoop configuration.  I was attempting to kick off my map reduce job from within a larger program that I was running via the "java -jar" command.  I think if I had kicked off the job with the "hadoop jar" command it would have worked.  To set the correct configuration in my job, I set my configuration manually with the following lines:

Configuration conf = getConf();
        conf.addResource("path_to_mapred-site.xml");
        conf.addResource("path_to_core-site.xml");
        conf.addResource("path_to_hdfs-site.xml");

        //mapred.job.tracker as defined in mapred-site.xml
        conf.set("mapred.job.tracker", <value from mapred.job.tracker>);

       //fs.default.name as defined in core-site.xml
        conf.set("fs.default.name", <value from core-site.xml>);

Before hand, my job was not showing up in the task tracker.  Now it shows up correctly and completes successfully.

Thanks again!
Duane

From: Billie Rinaldi [mailto:[EMAIL PROTECTED]]
Sent: Monday, November 05, 2012 12:18 PM
To: [EMAIL PROTECTED]
Subject: Re: Accumulo Map Reduce is not distributed

On Mon, Nov 5, 2012 at 8:54 AM, Cornish, Duane C. <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Billie,

Thanks for the advice.  I have had those variables set correctly in accumulo-env.sh.  I've been using this cloud for a couple months with no problems (I was not running map reduce jobs on it though).  I also just checked and re-exported those environment variables right before I run my Accumulo MR job.  I tried outputting the environment variables from within my job class and they resolve correctly.

Does it matter that I am using Accumulo version 1.4.1 and hadoop 1.0.3?  I know that Accumulo 1.4.1 was tested with hadoop 0.20.2.

Any further guidance would be greatly appreciated.

Hadoop 1.0.3 should be fine.  It's likely to be what David Medinets suggested.  To get the Hadoop conf on your classpath, try something like the following (assuming you're running your job with "hadoop jar"):

export HADOOP_CLASSPATH=$HADOOP_HOME/conf:$HADOOP_CLASSPATH

Billie

Duane

From: Billie Rinaldi [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: Monday, November 05, 2012 10:04 AM

To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: Re: Accumulo Map Reduce is not distributed

On Mon, Nov 5, 2012 at 6:46 AM, Cornish, Duane C. <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Billie,

I think I just started to come to that same conclusion (I'm relatively new to cloud computing).  It appears that it is running in local mode.  My console output says "mapred.LocalJobRunner" and the job never appears on my Hadoop Job page.  How do I fix this problem?  I also found that the "JobTracker" link on my Accumulo Overview page points to  http://0.0.0.0:50030/  instead of the actual computer name.

First check your accumulo-env.sh in the Accumulo conf directory.  For the lines that look like the following, change the "/path/to/X" locations to the actual Java, Hadoop, and Zookeeper directories.

test -z "$JAVA_HOME"             && export JAVA_HOME=/path/to/java
test -z "$HADOOP_HOME"           && export HADOOP_HOME=/path/to/hadoop
test -z "$ZOOKEEPER_HOME"        && export ZOOKEEPER_HOME=/path/to/zookeeper

You may also need to make sure that the command you use to run the MR job has JAVA_HOME, HADOOP_HOME, ZOOKEEPER_HOME, and ACCUMULO_HOME environment variables, which can be done by using export commands like the ones above.

Billie

Duane

From: Billie Rinaldi [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: Monday, November 05, 2012 9:41 AM

To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: Re: Accumulo Map Reduce is not distributed

On Mon, Nov 5, 2012 at 6:13 AM, John Vines <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:

So it sounds like the job was correctly set to 4 mappers and your issue is in your MapReduce configuration. I would check the jobtracker page and verify the number of map slots, as well as how they're running, as print statements are not the most accurate in the framework.

Also make sure your MR job isn't running in local mode.  Sometimes that happens if your job can't find the Hadoop configuration directory.

Billie

Sent from my phone, pardon the typos and brevity.
On Nov 5, 2012 8:59 AM, "Cornish, Duane C." <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Hi William,

Thanks for helping me out and sorry I didn't get back to you sooner, I was away for the weekend.  I am only callying ToolRunner.run once.

public static void ExtractFeaturesFromNewImages() throws Exception{
              String[] parameters = new String[1];
              parameters[0] = "foo";
              InitializeFeatureExtractor();
              ToolRunner.run(CachedConfiguration.getInstance(), new Accumulo_FE_MR_Job(), parameters);
       }

Another indicator that I'm only calling it once is that before I was pre-splitting the table, I was just getting one larger map-reduce job with only 1 mapper.  Based on my print statements, the job was running in sequence (which I guess makes sense because the table only existed on one node in my cluster.  Then after pre-splitting my table, I was getting one job that had 4 mappers.  Each was running one after the other.  I hadn't changed any code (other than adding in the splits).  So, I'm only calling ToolRunner.run once.  Furthermore, my run function in my job class is provided below:

       @Override
       public int run(String[] arg0) throws Exception {
              runOneTable();
              return 0;
       }

Thanks,
Duane
From: William Slacum [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: Friday, November 02, 2012 8:48 PM
+
David Medinets 2012-11-06, 14:34
+
Cornish, Duane C. 2012-11-06, 14:53
+
Billie Rinaldi 2012-11-06, 15:19
+
David Medinets 2012-11-05, 15:16
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB