Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo, mail # user - Programtically invoking a Map/Reduce job


+
Mike Hugo 2013-01-16, 17:11
+
John Vines 2013-01-16, 17:20
+
Mike Hugo 2013-01-16, 20:07
+
Billie Rinaldi 2013-01-16, 21:11
+
Mike Hugo 2013-01-17, 19:16
Copy link to this message
-
Re: Programtically invoking a Map/Reduce job
Billie Rinaldi 2013-01-17, 19:57
On Thu, Jan 17, 2013 at 11:16 AM, Mike Hugo <[EMAIL PROTECTED]> wrote:

> Thanks Billie!
>
> Setting "mapred.job.tracker" and "fs.default.name" in the conf has gotten
> me further.
>
>          job.getConfiguration().set("mapred.job.tracker",
> "server_name_here:8021");
>         job.getConfiguration().set("fs.default.name",
> "hdfs://server_name_here:8020");
>
> What's interesting now is that the job can't find Accumulo classes - when
> I run the job now, I get
>
> 2013-01-17 12:59:25,278 [main] INFO  mapred.JobClient  - Task Id :
> attempt_201301171102_0012_m_000000_1, Status : FAILED
> java.lang.RuntimeException: java.lang.ClassNotFoundException:
> org.apache.accumulo.core.client.mapreduce.AccumuloRowInputFormat
>
> Is there a way to inform the job (via the Job API, on a separate machine
> not running hadoop) about extra libs to include on the classpath of the job?
>

You normally inform a job about jars it needs by specifying "-libjars
comma,separated,jar,list" on the command line.  In this case, you need to
put those two strings "-libjars" and "jar,list" in the String[] args passed
to ToolRunner.run:
ToolRunner.run(CachedConfiguration.getInstance(), new ...(), args)

The accumulo-core jar probably isn't the only one you'll need.

Billie
>
> Thanks
>
> Mike
>
>
>
> On Wed, Jan 16, 2013 at 3:11 PM, Billie Rinaldi <[EMAIL PROTECTED]> wrote:
>
>> Your job is running in "local" mode (Running job: job_local_0001).  This
>> basically means that the hadoop configuration is not present on the
>> classpath of the java client kicking off the job.  If you weren't planning
>> to have the hadoop config on that machine, you might be able to get away
>> with setting "mapred.job.tracker" and probably also "fs.default.name" on
>> the Configuration object.
>>
>> Billie
>>
>>
>>
>> On Wed, Jan 16, 2013 at 12:07 PM, Mike Hugo <[EMAIL PROTECTED]> wrote:
>>
>>> Cool, thanks for the feedback John, the examples have been helpful in
>>> getting up and running!
>>>
>>> Perhaps I'm not doing something quite right.  When I jar up my jobs and
>>> deploy the jar to the server and run it via the tool.sh command on the
>>> cluster, I see the job running in the jobtracker (servername:50030) and it
>>> runs as I would expect.
>>>
>>> 13/01/16 14:39:53 INFO mapred.JobClient: Running job:
>>> job_201301161326_0006
>>> 13/01/16 14:39:54 INFO mapred.JobClient:  map 0% reduce 0%
>>> 13/01/16 14:41:29 INFO mapred.JobClient:  map 50% reduce 0%
>>> 13/01/16 14:41:35 INFO mapred.JobClient:  map 100% reduce 0%
>>> 13/01/16 14:41:40 INFO mapred.JobClient: Job complete:
>>> job_201301161326_0006
>>> 13/01/16 14:41:40 INFO mapred.JobClient: Counters: 18
>>> 13/01/16 14:41:40 INFO mapred.JobClient:   Job Counters
>>> 13/01/16 14:41:40 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=180309
>>> 13/01/16 14:41:40 INFO mapred.JobClient:     Total time spent by all
>>> reduces waiting after reserving slots (ms)=0
>>> 13/01/16 14:41:40 INFO mapred.JobClient:     Total time spent by all
>>> maps waiting after reserving slots (ms)=0
>>> 13/01/16 14:41:40 INFO mapred.JobClient:     Rack-local map tasks=2
>>> 13/01/16 14:41:40 INFO mapred.JobClient:     Launched map tasks=2
>>> 13/01/16 14:41:40 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
>>> 13/01/16 14:41:40 INFO mapred.JobClient:   File Output Format Counters
>>> 13/01/16 14:41:40 INFO mapred.JobClient:     Bytes Written=0
>>> 13/01/16 14:41:40 INFO mapred.JobClient:   FileSystemCounters
>>> 13/01/16 14:41:40 INFO mapred.JobClient:     HDFS_BYTES_READ=248
>>> 13/01/16 14:41:40 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=60214
>>> 13/01/16 14:41:40 INFO mapred.JobClient:   File Input Format Counters
>>> 13/01/16 14:41:40 INFO mapred.JobClient:     Bytes Read=0
>>> 13/01/16 14:41:40 INFO mapred.JobClient:   Map-Reduce Framework
>>> 13/01/16 14:41:40 INFO mapred.JobClient:     Map input records=1036434
>>> 13/01/16 14:41:40 INFO mapred.JobClient:     Physical memory (bytes)
>>> snapshot=373760000
>>> 13/01/16 14:41:40 INFO mapred.JobClient:     Spilled Records=0
+
Mike Hugo 2013-01-17, 20:41