Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Accumulo Map Reduce is not distributed


Copy link to this message
-
Re: Accumulo Map Reduce is not distributed
On Mon, Nov 5, 2012 at 6:13 AM, John Vines <[EMAIL PROTECTED]> wrote:

> So it sounds like the job was correctly set to 4 mappers and your issue is
> in your MapReduce configuration. I would check the jobtracker page and
> verify the number of map slots, as well as how they're running, as print
> statements are not the most accurate in the framework.
>

Also make sure your MR job isn't running in local mode.  Sometimes that
happens if your job can't find the Hadoop configuration directory.

Billie

> Sent from my phone, pardon the typos and brevity.
> On Nov 5, 2012 8:59 AM, "Cornish, Duane C." <[EMAIL PROTECTED]>
> wrote:
>
>> Hi William,****
>>
>> ** **
>>
>> Thanks for helping me out and sorry I didn’t get back to you sooner, I
>> was away for the weekend.  I am only callying ToolRunner.run once.****
>>
>> ** **
>>
>> *public* *static* *void* ExtractFeaturesFromNewImages() *throws*Exception{
>> ****
>>
>>               String[] parameters = *new* String[1];****
>>
>>               parameters[0] = "foo";****
>>
>>               *InitializeFeatureExtractor*();****
>>
>>               ToolRunner.*run*(CachedConfiguration.*getInstance*(), *new*Accumulo_FE_MR_Job(), parameters);
>> ****
>>
>>        }****
>>
>> ** **
>>
>> Another indicator that I’m only calling it once is that before I was
>> pre-splitting the table, I was just getting one larger map-reduce job with
>> only 1 mapper.  Based on my print statements, the job was running in
>> sequence (which I guess makes sense because the table only existed on one
>> node in my cluster.  Then after pre-splitting my table, I was getting one
>> job that had 4 mappers.  Each was running one after the other.  I hadn’t
>> changed any code (other than adding in the splits).  So, I’m only calling
>> ToolRunner.run once.  Furthermore, my run function in my job class is
>> provided below:****
>>
>> ** **
>>
>>        @Override****
>>
>>        *public* *int* run(String[] arg0) *throws* Exception {        ****
>>
>>               runOneTable();****
>>
>>               *return* 0;****
>>
>>        }****
>>
>> ** **
>>
>> Thanks,****
>>
>> Duane****
>>
>> *From:* William Slacum [mailto:[EMAIL PROTECTED]]
>> *Sent:* Friday, November 02, 2012 8:48 PM
>> *To:* [EMAIL PROTECTED]
>> *Subject:* Re: Accumulo Map Reduce is not distributed****
>>
>> ** **
>>
>> What about the main method that calls ToolRunner.run? If you have 4 jobs
>> being created, then you're calling run(String[]) or runOneTable() 4 times.
>> ****
>>
>> On Fri, Nov 2, 2012 at 5:21 PM, Cornish, Duane C. <
>> [EMAIL PROTECTED]> wrote:****
>>
>> Thanks for the prompt response John!****
>>
>> When I say that I’m pre-splitting my table, I mean I am using the
>> tableOperations().addSplits(table,splits) command.  I have verified that
>> this is correctly splitting my table into 4 tablets and it is being
>> distributed across my cloud before I start my map reduce job.****
>>
>>  ****
>>
>> Now, I only kick off the job once, but it appears that 4 separate jobs
>> run (one after the other).  The first one reaches 100% in its map phase
>> (and based on my output only handled ¼ of the data), then the next job
>> starts at 0% and reaches 100%, and so on.  So I think I’m “only running
>> one mapper at a time in an MR job that has 4 mappers total.”.  I have 2
>> mapper slots per node.  My hadoop is set up so that one machine is the
>> namenode and the other 3 are datanodes.  This gives me 6 slots total.
>> (This is not congruent to my accumulo where the master is also a slave –
>> giving 4 total slaves).  ****
>>
>>  ****
>>
>> My map reduce job is not a chain job, so all 4 tablets should be able to
>> run at the same time.****
>>
>>  ****
>>
>> Here is my job class code below:****
>>
>>  ****
>>
>> *import* org.apache.accumulo.core.security.Authorizations;****
>>
>> *import* org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat;*
>> ***
>>
>> *import* org.apache.accumulo.core.client.mapreduce.AccumuloRowInputFormat