Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> Programtically invoking a Map/Reduce job


+
Mike Hugo 2013-01-16, 17:11
+
John Vines 2013-01-16, 17:20
+
Mike Hugo 2013-01-16, 20:07
+
Billie Rinaldi 2013-01-16, 21:11
Copy link to this message
-
Re: Programtically invoking a Map/Reduce job
Thanks Billie!

Setting "mapred.job.tracker" and "fs.default.name" in the conf has gotten
me further.

        job.getConfiguration().set("mapred.job.tracker",
"server_name_here:8021");
        job.getConfiguration().set("fs.default.name",
"hdfs://server_name_here:8020");

What's interesting now is that the job can't find Accumulo classes - when I
run the job now, I get

2013-01-17 12:59:25,278 [main] INFO  mapred.JobClient  - Task Id :
attempt_201301171102_0012_m_000000_1, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.apache.accumulo.core.client.mapreduce.AccumuloRowInputFormat

Is there a way to inform the job (via the Job API, on a separate machine
not running hadoop) about extra libs to include on the classpath of the job?

Thanks

Mike

On Wed, Jan 16, 2013 at 3:11 PM, Billie Rinaldi <[EMAIL PROTECTED]> wrote:

> Your job is running in "local" mode (Running job: job_local_0001).  This
> basically means that the hadoop configuration is not present on the
> classpath of the java client kicking off the job.  If you weren't planning
> to have the hadoop config on that machine, you might be able to get away
> with setting "mapred.job.tracker" and probably also "fs.default.name" on
> the Configuration object.
>
> Billie
>
>
>
> On Wed, Jan 16, 2013 at 12:07 PM, Mike Hugo <[EMAIL PROTECTED]> wrote:
>
>> Cool, thanks for the feedback John, the examples have been helpful in
>> getting up and running!
>>
>> Perhaps I'm not doing something quite right.  When I jar up my jobs and
>> deploy the jar to the server and run it via the tool.sh command on the
>> cluster, I see the job running in the jobtracker (servername:50030) and it
>> runs as I would expect.
>>
>> 13/01/16 14:39:53 INFO mapred.JobClient: Running job:
>> job_201301161326_0006
>> 13/01/16 14:39:54 INFO mapred.JobClient:  map 0% reduce 0%
>> 13/01/16 14:41:29 INFO mapred.JobClient:  map 50% reduce 0%
>> 13/01/16 14:41:35 INFO mapred.JobClient:  map 100% reduce 0%
>> 13/01/16 14:41:40 INFO mapred.JobClient: Job complete:
>> job_201301161326_0006
>> 13/01/16 14:41:40 INFO mapred.JobClient: Counters: 18
>> 13/01/16 14:41:40 INFO mapred.JobClient:   Job Counters
>> 13/01/16 14:41:40 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=180309
>> 13/01/16 14:41:40 INFO mapred.JobClient:     Total time spent by all
>> reduces waiting after reserving slots (ms)=0
>> 13/01/16 14:41:40 INFO mapred.JobClient:     Total time spent by all maps
>> waiting after reserving slots (ms)=0
>> 13/01/16 14:41:40 INFO mapred.JobClient:     Rack-local map tasks=2
>> 13/01/16 14:41:40 INFO mapred.JobClient:     Launched map tasks=2
>> 13/01/16 14:41:40 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
>> 13/01/16 14:41:40 INFO mapred.JobClient:   File Output Format Counters
>> 13/01/16 14:41:40 INFO mapred.JobClient:     Bytes Written=0
>> 13/01/16 14:41:40 INFO mapred.JobClient:   FileSystemCounters
>> 13/01/16 14:41:40 INFO mapred.JobClient:     HDFS_BYTES_READ=248
>> 13/01/16 14:41:40 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=60214
>> 13/01/16 14:41:40 INFO mapred.JobClient:   File Input Format Counters
>> 13/01/16 14:41:40 INFO mapred.JobClient:     Bytes Read=0
>> 13/01/16 14:41:40 INFO mapred.JobClient:   Map-Reduce Framework
>> 13/01/16 14:41:40 INFO mapred.JobClient:     Map input records=1036434
>> 13/01/16 14:41:40 INFO mapred.JobClient:     Physical memory (bytes)
>> snapshot=373760000
>> 13/01/16 14:41:40 INFO mapred.JobClient:     Spilled Records=0
>> 13/01/16 14:41:40 INFO mapred.JobClient:     CPU time spent (ms)=24410
>> 13/01/16 14:41:40 INFO mapred.JobClient:     Total committed heap usage
>> (bytes)=168394752
>> 13/01/16 14:41:40 INFO mapred.JobClient:     Virtual memory (bytes)
>> snapshot=2124627968
>> 13/01/16 14:41:40 INFO mapred.JobClient:     Map output records=2462684
>> 13/01/16 14:41:40 INFO mapred.JobClient:     SPLIT_RAW_BYTES=248
>>
>>
>>
>> When I kick off a job via a java client running on a different host, the
>> job seems to run (I can see things being scanned and ingested) but I don't
+
Billie Rinaldi 2013-01-17, 19:57
+
Mike Hugo 2013-01-17, 20:41