Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - Accumulo Map Reduce is not distributed


Copy link to this message
-
RE: Accumulo Map Reduce is not distributed
John Vines 2012-11-05, 14:13
So it sounds like the job was correctly set to 4 mappers and your issue is
in your MapReduce configuration. I would check the jobtracker page and
verify the number of map slots, as well as how they're running, as print
statements are not the most accurate in the framework.

Sent from my phone, pardon the typos and brevity.
On Nov 5, 2012 8:59 AM, "Cornish, Duane C." <[EMAIL PROTECTED]>
wrote:

> Hi William,****
>
> ** **
>
> Thanks for helping me out and sorry I didn’t get back to you sooner, I was
> away for the weekend.  I am only callying ToolRunner.run once.****
>
> ** **
>
> *public* *static* *void* ExtractFeaturesFromNewImages() *throws*Exception{
> ****
>
>               String[] parameters = *new* String[1];****
>
>               parameters[0] = "foo";****
>
>               *InitializeFeatureExtractor*();****
>
>               ToolRunner.*run*(CachedConfiguration.*getInstance*(), *new*Accumulo_FE_MR_Job(), parameters);
> ****
>
>        }****
>
> ** **
>
> Another indicator that I’m only calling it once is that before I was
> pre-splitting the table, I was just getting one larger map-reduce job with
> only 1 mapper.  Based on my print statements, the job was running in
> sequence (which I guess makes sense because the table only existed on one
> node in my cluster.  Then after pre-splitting my table, I was getting one
> job that had 4 mappers.  Each was running one after the other.  I hadn’t
> changed any code (other than adding in the splits).  So, I’m only calling
> ToolRunner.run once.  Furthermore, my run function in my job class is
> provided below:****
>
> ** **
>
>        @Override****
>
>        *public* *int* run(String[] arg0) *throws* Exception {        ****
>
>               runOneTable();****
>
>               *return* 0;****
>
>        }****
>
> ** **
>
> Thanks,****
>
> Duane****
>
> *From:* William Slacum [mailto:[EMAIL PROTECTED]]
> *Sent:* Friday, November 02, 2012 8:48 PM
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: Accumulo Map Reduce is not distributed****
>
> ** **
>
> What about the main method that calls ToolRunner.run? If you have 4 jobs
> being created, then you're calling run(String[]) or runOneTable() 4 times.
> ****
>
> On Fri, Nov 2, 2012 at 5:21 PM, Cornish, Duane C. <
> [EMAIL PROTECTED]> wrote:****
>
> Thanks for the prompt response John!****
>
> When I say that I’m pre-splitting my table, I mean I am using the
> tableOperations().addSplits(table,splits) command.  I have verified that
> this is correctly splitting my table into 4 tablets and it is being
> distributed across my cloud before I start my map reduce job.****
>
>  ****
>
> Now, I only kick off the job once, but it appears that 4 separate jobs run
> (one after the other).  The first one reaches 100% in its map phase (and
> based on my output only handled ¼ of the data), then the next job starts at
> 0% and reaches 100%, and so on.  So I think I’m “only running one mapper
> at a time in an MR job that has 4 mappers total.”.  I have 2 mapper slots
> per node.  My hadoop is set up so that one machine is the namenode and the
> other 3 are datanodes.  This gives me 6 slots total.  (This is not
> congruent to my accumulo where the master is also a slave – giving 4 total
> slaves).  ****
>
>  ****
>
> My map reduce job is not a chain job, so all 4 tablets should be able to
> run at the same time.****
>
>  ****
>
> Here is my job class code below:****
>
>  ****
>
> *import* org.apache.accumulo.core.security.Authorizations;****
>
> *import* org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat;**
> **
>
> *import* org.apache.accumulo.core.client.mapreduce.AccumuloRowInputFormat;
> ****
>
> *import* org.apache.hadoop.conf.Configured;****
>
> *import* org.apache.hadoop.io.DoubleWritable;****
>
> *import* org.apache.hadoop.io.Text;****
>
> *import* org.apache.hadoop.mapreduce.Job;****
>
> *import* org.apache.hadoop.util.Tool;****
>
> *import* org.apache.log4j.Level;****
>
>  ****
>
>  ****
>
> *public* *class* Accumulo_FE_MR_Job *extends* Configured *implements*Tool{