Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> Programtically invoking a Map/Reduce job


+
Mike Hugo 2013-01-16, 17:11
+
John Vines 2013-01-16, 17:20
Copy link to this message
-
Re: Programtically invoking a Map/Reduce job
Cool, thanks for the feedback John, the examples have been helpful in
getting up and running!

Perhaps I'm not doing something quite right.  When I jar up my jobs and
deploy the jar to the server and run it via the tool.sh command on the
cluster, I see the job running in the jobtracker (servername:50030) and it
runs as I would expect.

13/01/16 14:39:53 INFO mapred.JobClient: Running job: job_201301161326_0006
13/01/16 14:39:54 INFO mapred.JobClient:  map 0% reduce 0%
13/01/16 14:41:29 INFO mapred.JobClient:  map 50% reduce 0%
13/01/16 14:41:35 INFO mapred.JobClient:  map 100% reduce 0%
13/01/16 14:41:40 INFO mapred.JobClient: Job complete: job_201301161326_0006
13/01/16 14:41:40 INFO mapred.JobClient: Counters: 18
13/01/16 14:41:40 INFO mapred.JobClient:   Job Counters
13/01/16 14:41:40 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=180309
13/01/16 14:41:40 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
13/01/16 14:41:40 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
13/01/16 14:41:40 INFO mapred.JobClient:     Rack-local map tasks=2
13/01/16 14:41:40 INFO mapred.JobClient:     Launched map tasks=2
13/01/16 14:41:40 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
13/01/16 14:41:40 INFO mapred.JobClient:   File Output Format Counters
13/01/16 14:41:40 INFO mapred.JobClient:     Bytes Written=0
13/01/16 14:41:40 INFO mapred.JobClient:   FileSystemCounters
13/01/16 14:41:40 INFO mapred.JobClient:     HDFS_BYTES_READ=248
13/01/16 14:41:40 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=60214
13/01/16 14:41:40 INFO mapred.JobClient:   File Input Format Counters
13/01/16 14:41:40 INFO mapred.JobClient:     Bytes Read=0
13/01/16 14:41:40 INFO mapred.JobClient:   Map-Reduce Framework
13/01/16 14:41:40 INFO mapred.JobClient:     Map input records=1036434
13/01/16 14:41:40 INFO mapred.JobClient:     Physical memory (bytes)
snapshot=373760000
13/01/16 14:41:40 INFO mapred.JobClient:     Spilled Records=0
13/01/16 14:41:40 INFO mapred.JobClient:     CPU time spent (ms)=24410
13/01/16 14:41:40 INFO mapred.JobClient:     Total committed heap usage
(bytes)=168394752
13/01/16 14:41:40 INFO mapred.JobClient:     Virtual memory (bytes)
snapshot=2124627968
13/01/16 14:41:40 INFO mapred.JobClient:     Map output records=2462684
13/01/16 14:41:40 INFO mapred.JobClient:     SPLIT_RAW_BYTES=248

When I kick off a job via a java client running on a different host, the
job seems to run (I can see things being scanned and ingested) but I don't
see anything via the jobtracker UI on the server.  Is that normal?  Or do I
have something mis-configured?

Here's how I'm starting things from the client:

    @Override
    public int run(String[] strings) throws Exception {
        Job job = new Job(getConf(), getClass().getSimpleName());
        job.setJarByClass(getClass());
        job.setMapperClass(MyMapper.class);

        job.setInputFormatClass(AccumuloRowInputFormat.class);
        AccumuloRowInputFormat.setZooKeeperInstance(job.getConfiguration(),
instanceName, zookeepers);

        AccumuloRowInputFormat.setInputInfo(job.getConfiguration(),
                username,
                password.getBytes(),
                "...",
                new Authorizations());

        job.setNumReduceTasks(0);

        job.setOutputFormatClass(AccumuloOutputFormat.class);
        job.setOutputKeyClass(Key.class);
        job.setOutputValueClass(Mutation.class);

        boolean createTables = true;
        String defaultTable = "...";
        AccumuloOutputFormat.setOutputInfo(job.getConfiguration(),
                username,
                password.getBytes(),
                createTables,
                defaultTable);

        AccumuloOutputFormat.setZooKeeperInstance(job.getConfiguration(),
instanceName, zookeepers);

        job.waitForCompletion(true);

        return job.isSuccessful() ? 0 : 1;
    }

    public static void main(String args[]) throws Exception {
        int res = ToolRunner.run(CachedConfiguration.getInstance(), new
...(), args);
        System.exit(res);
    }

Here's the output when I run it via the client application:
2013-01-16 13:55:57,645 [main-SendThread()] INFO  zookeeper.ClientCnxn  -
Opening socket connection to server accumulo/10.1.10.160:2181
2013-01-16 13:55:57,660 [main-SendThread(accumulo:2181)] INFO
 zookeeper.ClientCnxn  - Socket connection established to accumulo/
10.1.10.160:2181, initiating session
2013-01-16 13:55:57,671 [main-SendThread(accumulo:2181)] INFO
 zookeeper.ClientCnxn  - Session establishment complete on server accumulo/
10.1.10.160:2181, sessionid = 0x13c449cfe010434, negotiated timeout = 30000
2013-01-16 13:55:58,379 [main] INFO  mapred.JobClient  - Running job:
job_local_0001
2013-01-16 13:55:58,447 [Thread-16] INFO  mapred.Task  -  Using
ResourceCalculatorPlugin : null
2013-01-16 13:55:59,383 [main] INFO  mapred.JobClient  -  map 0% reduce 0%
2013-01-16 13:56:04,458 [communication thread] INFO  mapred.LocalJobRunner
 -
2013-01-16 13:56:07,459 [communication thread] INFO  mapred.LocalJobRunner
 -
2013-01-16 13:56:10,461 [communication thread] INFO  mapred.LocalJobRunner
 -
2013-01-16 13:56:13,462 [communication thread] INFO  mapred.LocalJobRunner
 -
2013-01-16 13:56:16,463 [communication thread] INFO  mapred.LocalJobRunner
 -
2013-01-16 13:56:19,465 [communication thread] INFO  mapred.LocalJobRunner
 -
2013-01-16 13:56:21,783 [Thread-16] INFO  mapred.Task  -
Task:attempt_local_0001_m_000000_0 is done. And is in the process of
commiting
2013-01-16 13:56:21,783 [Thread-16] INFO  mapred.LocalJobRunner  -
2013-01-16 13:56:21,784 [Thread-16] INFO  mapred.Task  - Task
'attempt_local_0001_m_000000_0' done.
2013-01-16 13:56:21,786 [Thread-16] INFO  mapred.Task  -  Using
ResourceCalculatorPlugin : null
2013-01-16 13:56:22,423 [main] INFO  mapred.JobClient  -  map 100% reduce 0%
2013-01-16 13:56:27,788 [communication thread] INFO  mapred.LocalJobRunner
 -
2013-01-16 13:56:
+
Billie Rinaldi 2013-01-16, 21:11
+
Mike Hugo 2013-01-17, 19:16
+
Billie Rinaldi 2013-01-17, 19:57
+
Mike Hugo 2013-01-17, 20:41