Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo, mail # user - Accumulo Map Reduce is not distributed


+
Cornish, Duane C. 2012-11-02, 20:53
+
John Vines 2012-11-02, 21:04
+
Cornish, Duane C. 2012-11-02, 21:21
+
William Slacum 2012-11-03, 00:48
+
David Medinets 2012-11-03, 03:49
Copy link to this message
-
RE: Accumulo Map Reduce is not distributed
Cornish, Duane C. 2012-11-05, 13:56
Hi William,

Thanks for helping me out and sorry I didn't get back to you sooner, I was away for the weekend.  I am only callying ToolRunner.run once.

public static void ExtractFeaturesFromNewImages() throws Exception{
              String[] parameters = new String[1];
              parameters[0] = "foo";
              InitializeFeatureExtractor();
              ToolRunner.run(CachedConfiguration.getInstance(), new Accumulo_FE_MR_Job(), parameters);
       }

Another indicator that I'm only calling it once is that before I was pre-splitting the table, I was just getting one larger map-reduce job with only 1 mapper.  Based on my print statements, the job was running in sequence (which I guess makes sense because the table only existed on one node in my cluster.  Then after pre-splitting my table, I was getting one job that had 4 mappers.  Each was running one after the other.  I hadn't changed any code (other than adding in the splits).  So, I'm only calling ToolRunner.run once.  Furthermore, my run function in my job class is provided below:

       @Override
       public int run(String[] arg0) throws Exception {
              runOneTable();
              return 0;
       }

Thanks,
Duane
From: William Slacum [mailto:[EMAIL PROTECTED]]
Sent: Friday, November 02, 2012 8:48 PM
To: [EMAIL PROTECTED]
Subject: Re: Accumulo Map Reduce is not distributed

What about the main method that calls ToolRunner.run? If you have 4 jobs being created, then you're calling run(String[]) or runOneTable() 4 times.
On Fri, Nov 2, 2012 at 5:21 PM, Cornish, Duane C. <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Thanks for the prompt response John!
When I say that I'm pre-splitting my table, I mean I am using the tableOperations().addSplits(table,splits) command.  I have verified that this is correctly splitting my table into 4 tablets and it is being distributed across my cloud before I start my map reduce job.

Now, I only kick off the job once, but it appears that 4 separate jobs run (one after the other).  The first one reaches 100% in its map phase (and based on my output only handled ¼ of the data), then the next job starts at 0% and reaches 100%, and so on.  So I think I'm "only running one mapper at a time in an MR job that has 4 mappers total.".  I have 2 mapper slots per node.  My hadoop is set up so that one machine is the namenode and the other 3 are datanodes.  This gives me 6 slots total.  (This is not congruent to my accumulo where the master is also a slave - giving 4 total slaves).

My map reduce job is not a chain job, so all 4 tablets should be able to run at the same time.

Here is my job class code below:

import org.apache.accumulo.core.security.Authorizations;
import org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat;
import org.apache.accumulo.core.client.mapreduce.AccumuloRowInputFormat;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.util.Tool;
import org.apache.log4j.Level;
public class Accumulo_FE_MR_Job extends Configured implements Tool{

       private void runOneTable() throws Exception {
        System.out.println("Running Map Reduce Feature Extraction Job");

        Job job  = new Job(getConf(), getClass().getName());

        job.setJarByClass(getClass());
        job.setJobName("MRFE");

        job.setInputFormatClass(AccumuloRowInputFormat.class);
        AccumuloRowInputFormat.setZooKeeperInstance(job.getConfiguration(),
                HMaxConstants.INSTANCE,
                HMaxConstants.ZOO_SERVERS);

        AccumuloRowInputFormat.setInputInfo(job.getConfiguration(),
                     HMaxConstants.USER,
                HMaxConstants.PASSWORD.getBytes(),
                HMaxConstants.FEATLESS_IMG_TABLE,
                new Authorizations());

        AccumuloRowInputFormat.setLogLevel(job.getConfiguration(), Level.FATAL);

        job.setMapperClass(AccumuloFEMapper.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(DoubleWritable.class);

        job.setNumReduceTasks(4);
        job.setReducerClass(AccumuloFEReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        job.setOutputFormatClass(AccumuloOutputFormat.class);
        AccumuloOutputFormat.setZooKeeperInstance(job.getConfiguration(),
                     HMaxConstants.INSTANCE,
                     HMaxConstants.ZOO_SERVERS);
        AccumuloOutputFormat.setOutputInfo(job.getConfiguration(),
                     HMaxConstants.USER,
                     HMaxConstants.PASSWORD.getBytes(),
                true,
                HMaxConstants.ALL_IMG_TABLE);

        AccumuloOutputFormat.setLogLevel(job.getConfiguration(), Level.FATAL);

        job.waitForCompletion(true);
        if (job.isSuccessful()) {
            System.err.println("Job Successful");
        } else {
            System.err.println("Job Unsuccessful");
        }
     }

       @Override
       public int run(String[] arg0) throws Exception {
              runOneTable();
              return 0;
       }
}

Thanks,
Duane

From: John Vines [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: Friday, November 02, 2012 5:04 PM
To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: Re: Accumulo Map Reduce is not distributed

This sounds like an issue with how your MR environment is configured and/or how you're kicking off your mapreduce.

Accumulo's input formats with automatically set the number of mappers to the number of tablets you have, so you should have seen your job go from 1 mapper to 4. What you describe is you now do 4 MR jobs instead of just one, is that correct? Because that doesn't make a lot of sense, unless by presplitting your table you meant you now have 4 different support tables. Or do you mean that you're only running one ma
+
John Vines 2012-11-05, 14:13
+
Billie Rinaldi 2012-11-05, 14:40
+
Cornish, Duane C. 2012-11-05, 14:46
+
Billie Rinaldi 2012-11-05, 15:03
+
Cornish, Duane C. 2012-11-05, 16:54
+
Krishmin Rai 2012-11-05, 17:14
+
Billie Rinaldi 2012-11-05, 17:18
+
Cornish, Duane C. 2012-11-06, 13:45
+
David Medinets 2012-11-06, 14:34
+
Cornish, Duane C. 2012-11-06, 14:53
+
Billie Rinaldi 2012-11-06, 15:19
+
David Medinets 2012-11-05, 15:16