Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # dev >> Issue : Hadoop mapreduce job to process S3 logs gets hung at INFO mapred.JobClient: map 0% reduce 0%


Copy link to this message
-
Re: Issue : Hadoop mapreduce job to process S3 logs gets hung at INFO mapred.JobClient: map 0% reduce 0%
Hi Nikita
There are certain things that can be checked if you mapred job is failing.
Here are some:

1.Make sure the url to s3 bucket includes terminating slash
2.Make sure the output directory does not pre-exists.

Thanks
Deepak
On Wed, Nov 23, 2011 at 2:50 AM, Nitika Gupta <[EMAIL PROTECTED]> wrote:

> Hi All,
>
> I am trying to run a mapreduce job to process the Amazon S3 logs.
> However, the code hangs at INFO mapred.JobClient: map 0% reduce 0% and
> does not even attempt to launch the tasks. The sample code for the job
> setup is given below:
>
> public int run(CommandLine cl) throws Exception
> {
> Configuration conf = getConf();
> String inputPath = "";
> String outputPath = "";
> try
> {
> Job job = new Job(conf, "Dummy");
> job.setNumReduceTasks(0);
> job.setMapperClass(Mapper.class);
> inputPath = cl.getOptionValue("input"); //input is an s3n path
> outputPath = cl.getOptionValue("output");
> FileInputFormat.setInputPaths(job, inputPath);
> FileOutputFormat.setOutputPath(job, new Path(outputPath));
> _log.info("Input path set as " + inputPath);
> _log.info("Output path set as " + outputPath);
> job.waitForCompletion(true); return 0;
> }
> catch (Exception ex)
> {
> _log.error(ex); return 1; }
> }
> The above code works on the staging machine. However, it fails on the
> production machine which is same as the staging machine with more
> capacity.
>
> Job Run:
> 11/11/22 16:13:38 INFO Driver: Input path being processed is
> s3n://abc/yyyy/mm/dd/*
> 11/11/22 16:13:38 INFO Driver: Output path being processed is
> s3n://xyz/yyyy/mm/dd/00/
> 11/11/22 16:13:51 INFO mapred.FileInputFormat: Total input paths to
> process : 399
> 11/11/22 16:13:53 INFO mapred.JobClient: Running job:
> job_201111151645_14535
> 11/11/22 16:13:54 INFO mapred.JobClient:  map 0% reduce 0%
>
> --- It hangs at this point.
>
> Does anyone know what could be the possible reason for the error?
>
> Thanks in advance!
>
> Nitika
>

--
Deepak Sharma
http://www.linkedin.com/in/rikindia