Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Programtically invoking a Map/Reduce job


Copy link to this message
-
Re: Programtically invoking a Map/Reduce job
Cool, thanks for the feedback John, the examples have been helpful in
getting up and running!

Perhaps I'm not doing something quite right.  When I jar up my jobs and
deploy the jar to the server and run it via the tool.sh command on the
cluster, I see the job running in the jobtracker (servername:50030) and it
runs as I would expect.

13/01/16 14:39:53 INFO mapred.JobClient: Running job: job_201301161326_0006
13/01/16 14:39:54 INFO mapred.JobClient:  map 0% reduce 0%
13/01/16 14:41:29 INFO mapred.JobClient:  map 50% reduce 0%
13/01/16 14:41:35 INFO mapred.JobClient:  map 100% reduce 0%
13/01/16 14:41:40 INFO mapred.JobClient: Job complete: job_201301161326_0006
13/01/16 14:41:40 INFO mapred.JobClient: Counters: 18
13/01/16 14:41:40 INFO mapred.JobClient:   Job Counters
13/01/16 14:41:40 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=180309
13/01/16 14:41:40 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
13/01/16 14:41:40 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
13/01/16 14:41:40 INFO mapred.JobClient:     Rack-local map tasks=2
13/01/16 14:41:40 INFO mapred.JobClient:     Launched map tasks=2
13/01/16 14:41:40 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
13/01/16 14:41:40 INFO mapred.JobClient:   File Output Format Counters
13/01/16 14:41:40 INFO mapred.JobClient:     Bytes Written=0
13/01/16 14:41:40 INFO mapred.JobClient:   FileSystemCounters
13/01/16 14:41:40 INFO mapred.JobClient:     HDFS_BYTES_READ=248
13/01/16 14:41:40 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=60214
13/01/16 14:41:40 INFO mapred.JobClient:   File Input Format Counters
13/01/16 14:41:40 INFO mapred.JobClient:     Bytes Read=0
13/01/16 14:41:40 INFO mapred.JobClient:   Map-Reduce Framework
13/01/16 14:41:40 INFO mapred.JobClient:     Map input records=1036434
13/01/16 14:41:40 INFO mapred.JobClient:     Physical memory (bytes)
snapshot=373760000
13/01/16 14:41:40 INFO mapred.JobClient:     Spilled Records=0
13/01/16 14:41:40 INFO mapred.JobClient:     CPU time spent (ms)=24410
13/01/16 14:41:40 INFO mapred.JobClient:     Total committed heap usage
(bytes)=168394752
13/01/16 14:41:40 INFO mapred.JobClient:     Virtual memory (bytes)
snapshot=2124627968
13/01/16 14:41:40 INFO mapred.JobClient:     Map output records=2462684
13/01/16 14:41:40 INFO mapred.JobClient:     SPLIT_RAW_BYTES=248

When I kick off a job via a java client running on a different host, the
job seems to run (I can see things being scanned and ingested) but I don't
see anything via the jobtracker UI on the server.  Is that normal?  Or do I
have something mis-configured?

Here's how I'm starting things from the client:

    @Override
    public int run(String[] strings) throws Exception {
        Job job = new Job(getConf(), getClass().getSimpleName());
        job.setJarByClass(getClass());
        job.setMapperClass(MyMapper.class);

        job.setInputFormatClass(AccumuloRowInputFormat.class);
        AccumuloRowInputFormat.setZooKeeperInstance(job.getConfiguration(),
instanceName, zookeepers);

        AccumuloRowInputFormat.setInputInfo(job.getConfiguration(),
                username,
                password.getBytes(),
                "...",
                new Authorizations());

        job.setNumReduceTasks(0);

        job.setOutputFormatClass(AccumuloOutputFormat.class);
        job.setOutputKeyClass(Key.class);
        job.setOutputValueClass(Mutation.class);

        boolean createTables = true;
        String defaultTable = "...";
        AccumuloOutputFormat.setOutputInfo(job.getConfiguration(),
                username,
                password.getBytes(),
                createTables,
                defaultTable);

        AccumuloOutputFormat.setZooKeeperInstance(job.getConfiguration(),
instanceName, zookeepers);

        job.waitForCompletion(true);

        return job.isSuccessful() ? 0 : 1;
    }

    public static void main(String args[]) throws Exception {
        int res = ToolRunner.run(CachedConfiguration.getInstance(), new
...(), args);
        System.exit(res);
    }

Here's the output when I run it via the client application:
2013-01-16 13:55:57,645 [main-SendThread()] INFO  zookeeper.ClientCnxn  -
Opening socket connection to server accumulo/10.1.10.160:2181
2013-01-16 13:55:57,660 [main-SendThread(accumulo:2181)] INFO
 zookeeper.ClientCnxn  - Socket connection established to accumulo/
10.1.10.160:2181, initiating session
2013-01-16 13:55:57,671 [main-SendThread(accumulo:2181)] INFO
 zookeeper.ClientCnxn  - Session establishment complete on server accumulo/
10.1.10.160:2181, sessionid = 0x13c449cfe010434, negotiated timeout = 30000
2013-01-16 13:55:58,379 [main] INFO  mapred.JobClient  - Running job:
job_local_0001
2013-01-16 13:55:58,447 [Thread-16] INFO  mapred.Task  -  Using
ResourceCalculatorPlugin : null
2013-01-16 13:55:59,383 [main] INFO  mapred.JobClient  -  map 0% reduce 0%
2013-01-16 13:56:04,458 [communication thread] INFO  mapred.LocalJobRunner
 -
2013-01-16 13:56:07,459 [communication thread] INFO  mapred.LocalJobRunner
 -
2013-01-16 13:56:10,461 [communication thread] INFO  mapred.LocalJobRunner
 -
2013-01-16 13:56:13,462 [communication thread] INFO  mapred.LocalJobRunner
 -
2013-01-16 13:56:16,463 [communication thread] INFO  mapred.LocalJobRunner
 -
2013-01-16 13:56:19,465 [communication thread] INFO  mapred.LocalJobRunner
 -
2013-01-16 13:56:21,783 [Thread-16] INFO  mapred.Task  -
Task:attempt_local_0001_m_000000_0 is done. And is in the process of
commiting
2013-01-16 13:56:21,783 [Thread-16] INFO  mapred.LocalJobRunner  -
2013-01-16 13:56:21,784 [Thread-16] INFO  mapred.Task  - Task
'attempt_local_0001_m_000000_0' done.
2013-01-16 13:56:21,786 [Thread-16] INFO  mapred.Task  -  Using
ResourceCalculatorPlugin : null
2013-01-16 13:56:22,423 [main] INFO  mapred.JobClient  -  map 100% reduce 0%
2013-01-16 13:56:27,788 [communication thread] INFO  mapred.LocalJobRunner
 -
2013-01-16 13:56:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB