Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: Error using hadoop in non-distributed mode

Copy link to this message
Re: Error using hadoop in non-distributed mode
Thanks! You nailed it.

Mahout was using the cache but fortunately there was an easy way to tell it not to and now the jobs run local and therefore in a debugging setup.
On Sep 4, 2012, at 9:22 PM, Hemanth Yamijala <[EMAIL PROTECTED]> wrote:


The path /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/<snip> is a location used by the tasktracker process for the 'DistributedCache' - a mechanism to distribute files to all tasks running in a map reduce job. (http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache).

You have mentioned Mahout, so I am assuming that the specific analysis job you are running is using this feature to distribute the output of the file /Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 to the job that is causing a failure.

Also, I find links stating the distributed cache does not work with in the local (non-HDFS) mode. (http://stackoverflow.com/questions/9148724/multiple-input-into-a-mapper-in-hadoop). Look at the second answer.

On Tue, Sep 4, 2012 at 10:33 PM, Pat Ferrel <[EMAIL PROTECTED]> wrote:
The job is creating several output and intermediate files all under the location: Users/pat/Projects/big-data/b/ssvd/ several output directories and files are created correctly and the file Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 is created and exists at the time of the error. We seem to be passing in Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 as the input file.

Under what circumstances would an input path passed in as "Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000" be turned into "pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000"

On Sep 4, 2012, at 1:14 AM, Narasingu Ramesh <[EMAIL PROTECTED]> wrote:

Hi Pat,
            Please specify correct input file location.
Thanks & Regards,

On Mon, Sep 3, 2012 at 9:28 PM, Pat Ferrel <[EMAIL PROTECTED]> wrote:
Using hadoop with mahout in a local filesystem/non-hdfs config for debugging purposes inside Intellij IDEA. When I run one particular part of the analysis I get the error below. I didn't write the code but we are looking for some hint about what might cause it. This job completes without error in a single node pseudo-clustered config outside of IDEA.

several jobs in the pipeline complete without error creating part files just fine in the local file system

The file /tmp/hadoop-pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

which is the subject of the error - does not exist


does exist at the time of the error. So the code is looking for the data in the wrong place?

12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 WARN mapred.LocalJobRunner: job_local_0002
java.io.FileNotFoundException: File /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 does not exist.
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
        at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:92)
        at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:219)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Exception in thread "main" java.io.IOException: Bt job unsuccessful.
        at org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:609)
        at org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:397)
        at com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
        at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
Disconnected from the target VM, address: '', transport: 'socket'