Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # dev >> Issue : Hadoop mapreduce job to process S3 logs gets hung at INFO mapred.JobClient: map 0% reduce 0%


+
Nitika Gupta 2011-11-22, 21:20
Copy link to this message
-
Re: Issue : Hadoop mapreduce job to process S3 logs gets hung at INFO mapred.JobClient: map 0% reduce 0%
Hi Nikita
There are certain things that can be checked if you mapred job is failing.
Here are some:

1.Make sure the url to s3 bucket includes terminating slash
2.Make sure the output directory does not pre-exists.

Thanks
Deepak
On Wed, Nov 23, 2011 at 2:50 AM, Nitika Gupta <[EMAIL PROTECTED]> wrote:

> Hi All,
>
> I am trying to run a mapreduce job to process the Amazon S3 logs.
> However, the code hangs at INFO mapred.JobClient: map 0% reduce 0% and
> does not even attempt to launch the tasks. The sample code for the job
> setup is given below:
>
> public int run(CommandLine cl) throws Exception
> {
> Configuration conf = getConf();
> String inputPath = "";
> String outputPath = "";
> try
> {
> Job job = new Job(conf, "Dummy");
> job.setNumReduceTasks(0);
> job.setMapperClass(Mapper.class);
> inputPath = cl.getOptionValue("input"); //input is an s3n path
> outputPath = cl.getOptionValue("output");
> FileInputFormat.setInputPaths(job, inputPath);
> FileOutputFormat.setOutputPath(job, new Path(outputPath));
> _log.info("Input path set as " + inputPath);
> _log.info("Output path set as " + outputPath);
> job.waitForCompletion(true); return 0;
> }
> catch (Exception ex)
> {
> _log.error(ex); return 1; }
> }
> The above code works on the staging machine. However, it fails on the
> production machine which is same as the staging machine with more
> capacity.
>
> Job Run:
> 11/11/22 16:13:38 INFO Driver: Input path being processed is
> s3n://abc/yyyy/mm/dd/*
> 11/11/22 16:13:38 INFO Driver: Output path being processed is
> s3n://xyz/yyyy/mm/dd/00/
> 11/11/22 16:13:51 INFO mapred.FileInputFormat: Total input paths to
> process : 399
> 11/11/22 16:13:53 INFO mapred.JobClient: Running job:
> job_201111151645_14535
> 11/11/22 16:13:54 INFO mapred.JobClient:  map 0% reduce 0%
>
> --- It hangs at this point.
>
> Does anyone know what could be the possible reason for the error?
>
> Thanks in advance!
>
> Nitika
>

--
Deepak Sharma
http://www.linkedin.com/in/rikindia
+
Nitika Gupta 2011-11-23, 19:28
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB