Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: problem using s3 instead of hdfs


+
Hemanth Yamijala 2012-10-16, 07:11
+
sudha sadhasivam 2012-10-16, 08:00
+
Rahul Patodi 2012-10-16, 10:23
+
Yanbo Liang 2012-10-16, 10:59
+
Parth Savani 2012-10-16, 14:32
Copy link to this message
-
Re: problem using s3 instead of hdfs
Parth,

I notice in the below stack trace that the LocalJobRunner, instead of the
JobTracker is being used. Are you sure this is a distributed cluster ?
Could you please check the value of mapred.job.tracker ?

Thanks
Hemanth

On Tue, Oct 16, 2012 at 8:02 PM, Parth Savani <[EMAIL PROTECTED]>wrote:

> Hello Hemanth,
>         I set the hadoop staging directory to s3 location. However, it
> complains. Below is the error
>
> 12/10/16 10:22:47 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId> Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS:
> s3n://ABCD:ABCD@ABCD/tmp/mapred/staging/psavani1821193643/.staging,
> expected: file:///
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:410)
>  at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:322)
> at
> org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.java:79)
>  at
> org.apache.hadoop.mapred.LocalJobRunner.getStagingAreaDir(LocalJobRunner.java:541)
> at
> org.apache.hadoop.mapred.JobClient.getStagingAreaDir(JobClient.java:1204)
>  at
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:102)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:839)
>  at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>  at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:476)
>  at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506)
> at
> com.sensenetworks.macrosensedata.ParseLogsMacrosense.run(ParseLogsMacrosense.java:54)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>  at
> com.sensenetworks.macrosensedata.ParseLogsMacrosense.main(ParseLogsMacrosense.java:121)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
>
>
> On Tue, Oct 16, 2012 at 3:11 AM, Hemanth Yamijala <
> [EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> I've not tried this on S3. However, the directory mentioned in the
>> exception is based on the value of this particular configuration
>> key: mapreduce.jobtracker.staging.root.dir. This defaults
>> to ${hadoop.tmp.dir}/mapred/staging. Can you please set this to an S3
>> location and try ?
>>
>> Thanks
>> Hemanth
>>
>>
>> On Mon, Oct 15, 2012 at 10:43 PM, Parth Savani <[EMAIL PROTECTED]>wrote:
>>
>>> Hello,
>>>       I am trying to run hadoop on s3 using distributed mode. However I
>>> am having issues running my job successfully on it. I get the following
>>> error
>>> I followed the instructions provided in this article ->
>>> http://wiki.apache.org/hadoop/AmazonS3
>>> I replaced the fs.default.name value in my hdfs-site.xml to
>>> s3n://ID:SECRET@BUCKET
>>> And I am running my job using the following: hadoop jar
>>> /path/to/my/jar/abcd.jar /input /output
>>> Where */input* is the folder name inside the s3 bucket
>>> (s3n://ID:SECRET@BUCKET/input)
>>> and */output *folder should created in my bucket (s3n://ID:SECRET@BUCKET
>>> /output)
>>> Below is the error i get. It is looking for job.jar on s3 and that path
>>> is on my server from where i am launching my job.
>>>
>>> java.io.FileNotFoundException: No such file or directory
>>> '/opt/data/hadoop/hadoop-mapred/mapred/staging/psavani/.staging/job_201207021606_1036/job.jar'
>>> at
>>> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:412)
>>>  at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:207)
+
Parth Savani 2012-10-16, 14:34
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB