Nitika Gupta 2011-11-22, 21:20
-Re: Issue : Hadoop mapreduce job to process S3 logs gets hung at INFO mapred.JobClient: map 0% reduce 0%
There are certain things that can be checked if you mapred job is failing.
Here are some:
1.Make sure the url to s3 bucket includes terminating slash
2.Make sure the output directory does not pre-exists.
On Wed, Nov 23, 2011 at 2:50 AM, Nitika Gupta <[EMAIL PROTECTED]> wrote:
> Hi All,
> I am trying to run a mapreduce job to process the Amazon S3 logs.
> However, the code hangs at INFO mapred.JobClient: map 0% reduce 0% and
> does not even attempt to launch the tasks. The sample code for the job
> setup is given below:
> public int run(CommandLine cl) throws Exception
> Configuration conf = getConf();
> String inputPath = "";
> String outputPath = "";
> Job job = new Job(conf, "Dummy");
> inputPath = cl.getOptionValue("input"); //input is an s3n path
> outputPath = cl.getOptionValue("output");
> FileInputFormat.setInputPaths(job, inputPath);
> FileOutputFormat.setOutputPath(job, new Path(outputPath));
> _log.info("Input path set as " + inputPath);
> _log.info("Output path set as " + outputPath);
> job.waitForCompletion(true); return 0;
> catch (Exception ex)
> _log.error(ex); return 1; }
> The above code works on the staging machine. However, it fails on the
> production machine which is same as the staging machine with more
> Job Run:
> 11/11/22 16:13:38 INFO Driver: Input path being processed is
> 11/11/22 16:13:38 INFO Driver: Output path being processed is
> 11/11/22 16:13:51 INFO mapred.FileInputFormat: Total input paths to
> process : 399
> 11/11/22 16:13:53 INFO mapred.JobClient: Running job:
> 11/11/22 16:13:54 INFO mapred.JobClient: map 0% reduce 0%
> --- It hangs at this point.
> Does anyone know what could be the possible reason for the error?
> Thanks in advance!
Nitika Gupta 2011-11-23, 19:28