Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: problem using s3 instead of hdfs


+
Hemanth Yamijala 2012-10-16, 07:11
+
sudha sadhasivam 2012-10-16, 08:00
+
Rahul Patodi 2012-10-16, 10:23
Copy link to this message
-
Re: problem using s3 instead of hdfs
Because you did not set defaultFS in conf, so you need to explicit indicate
the absolute path (include schema) of the file in S3 when you run a MR job.

2012/10/16 Rahul Patodi <[EMAIL PROTECTED]>

> I think these blog posts will answer your question:
>
>
> http://www.technology-mania.com/2012/05/s3-instead-of-hdfs-with-hadoop_05.html
>
> http://www.technology-mania.com/2011/05/s3-as-input-or-output-for-hadoop-mr.html
>
>
>
> On Tue, Oct 16, 2012 at 1:30 PM, sudha sadhasivam <
> [EMAIL PROTECTED]> wrote:
>
>> Is there a time dealy to fetch information from S3 to hadoop cluster when
>> compared to a regular hadoop cluster setup. Can an elastic block storage be
>> used for this purpose?
>> G Sudha
>>
>> --- On *Tue, 10/16/12, Hemanth Yamijala <[EMAIL PROTECTED]>*wrote:
>>
>>
>> From: Hemanth Yamijala <[EMAIL PROTECTED]>
>> Subject: Re: problem using s3 instead of hdfs
>> To: [EMAIL PROTECTED]
>> Date: Tuesday, October 16, 2012, 12:41 PM
>>
>>
>> Hi,
>>
>> I've not tried this on S3. However, the directory mentioned in the
>> exception is based on the value of this particular configuration
>> key: mapreduce.jobtracker.staging.root.dir. This defaults
>> to ${hadoop.tmp.dir}/mapred/staging. Can you please set this to an S3
>> location and try ?
>>
>> Thanks
>> Hemanth
>>
>> On Mon, Oct 15, 2012 at 10:43 PM, Parth Savani <[EMAIL PROTECTED]<http://mc/compose?[EMAIL PROTECTED]>
>> > wrote:
>>
>> Hello,
>>       I am trying to run hadoop on s3 using distributed mode. However I
>> am having issues running my job successfully on it. I get the following
>> error
>> I followed the instructions provided in this article ->
>> http://wiki.apache.org/hadoop/AmazonS3
>> I replaced the fs.default.name value in my hdfs-site.xml to
>> s3n://ID:SECRET@BUCKET
>> And I am running my job using the following: hadoop jar
>> /path/to/my/jar/abcd.jar /input /output
>> Where */input* is the folder name inside the s3 bucket
>> (s3n://ID:SECRET@BUCKET/input)
>> and */output *folder should created in my bucket (s3n://ID:SECRET@BUCKET
>> /output)
>> Below is the error i get. It is looking for job.jar on s3 and that path
>> is on my server from where i am launching my job.
>>
>> java.io.FileNotFoundException: No such file or directory
>> '/opt/data/hadoop/hadoop-mapred/mapred/staging/psavani/.staging/job_201207021606_1036/job.jar'
>> at
>> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:412)
>>  at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:207)
>> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:157)
>>  at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1371)
>> at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1352)
>>  at
>> org.apache.hadoop.mapred.JobLocalizer.localizeJobJarFile(JobLocalizer.java:273)
>> at
>> org.apache.hadoop.mapred.JobLocalizer.localizeJobFiles(JobLocalizer.java:381)
>>  at
>> org.apache.hadoop.mapred.JobLocalizer.localizeJobFiles(JobLocalizer.java:371)
>> at
>> org.apache.hadoop.mapred.DefaultTaskController.initializeJob(DefaultTaskController.java:222)
>>  at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1372)
>> at java.security.AccessController.doPri
>>
>>
>>
>>
>>
>>
>
>
> --
> *Regards*,
> Rahul Patodi
>
>
>
+
Parth Savani 2012-10-16, 14:32
+
Hemanth Yamijala 2012-10-16, 15:10
+
Parth Savani 2012-10-16, 14:34