|
|
-
Re: problem using s3 instead of hdfsYanbo Liang 2012-10-16, 10:59
Because you did not set defaultFS in conf, so you need to explicit indicate
the absolute path (include schema) of the file in S3 when you run a MR job. 2012/10/16 Rahul Patodi <[EMAIL PROTECTED]> > I think these blog posts will answer your question: > > > http://www.technology-mania.com/2012/05/s3-instead-of-hdfs-with-hadoop_05.html > > http://www.technology-mania.com/2011/05/s3-as-input-or-output-for-hadoop-mr.html > > > > On Tue, Oct 16, 2012 at 1:30 PM, sudha sadhasivam < > [EMAIL PROTECTED]> wrote: > >> Is there a time dealy to fetch information from S3 to hadoop cluster when >> compared to a regular hadoop cluster setup. Can an elastic block storage be >> used for this purpose? >> G Sudha >> >> --- On *Tue, 10/16/12, Hemanth Yamijala <[EMAIL PROTECTED]>*wrote: >> >> >> From: Hemanth Yamijala <[EMAIL PROTECTED]> >> Subject: Re: problem using s3 instead of hdfs >> To: [EMAIL PROTECTED] >> Date: Tuesday, October 16, 2012, 12:41 PM >> >> >> Hi, >> >> I've not tried this on S3. However, the directory mentioned in the >> exception is based on the value of this particular configuration >> key: mapreduce.jobtracker.staging.root.dir. This defaults >> to ${hadoop.tmp.dir}/mapred/staging. Can you please set this to an S3 >> location and try ? >> >> Thanks >> Hemanth >> >> On Mon, Oct 15, 2012 at 10:43 PM, Parth Savani <[EMAIL PROTECTED]<http://mc/compose?[EMAIL PROTECTED]> >> > wrote: >> >> Hello, >> I am trying to run hadoop on s3 using distributed mode. However I >> am having issues running my job successfully on it. I get the following >> error >> I followed the instructions provided in this article -> >> http://wiki.apache.org/hadoop/AmazonS3 >> I replaced the fs.default.name value in my hdfs-site.xml to >> s3n://ID:SECRET@BUCKET >> And I am running my job using the following: hadoop jar >> /path/to/my/jar/abcd.jar /input /output >> Where */input* is the folder name inside the s3 bucket >> (s3n://ID:SECRET@BUCKET/input) >> and */output *folder should created in my bucket (s3n://ID:SECRET@BUCKET >> /output) >> Below is the error i get. It is looking for job.jar on s3 and that path >> is on my server from where i am launching my job. >> >> java.io.FileNotFoundException: No such file or directory >> '/opt/data/hadoop/hadoop-mapred/mapred/staging/psavani/.staging/job_201207021606_1036/job.jar' >> at >> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:412) >> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:207) >> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:157) >> at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1371) >> at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1352) >> at >> org.apache.hadoop.mapred.JobLocalizer.localizeJobJarFile(JobLocalizer.java:273) >> at >> org.apache.hadoop.mapred.JobLocalizer.localizeJobFiles(JobLocalizer.java:381) >> at >> org.apache.hadoop.mapred.JobLocalizer.localizeJobFiles(JobLocalizer.java:371) >> at >> org.apache.hadoop.mapred.DefaultTaskController.initializeJob(DefaultTaskController.java:222) >> at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1372) >> at java.security.AccessController.doPri >> >> >> >> >> >> > > > -- > *Regards*, > Rahul Patodi > > > |