Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Re: problem using s3 instead of hdfs


+
Hemanth Yamijala 2012-10-16, 07:11
+
sudha sadhasivam 2012-10-16, 08:00
+
Rahul Patodi 2012-10-16, 10:23
Copy link to this message
-
Re: problem using s3 instead of hdfs
Yanbo Liang 2012-10-16, 10:59
Because you did not set defaultFS in conf, so you need to explicit indicate
the absolute path (include schema) of the file in S3 when you run a MR job.

2012/10/16 Rahul Patodi <[EMAIL PROTECTED]>

> I think these blog posts will answer your question:
>
>
> http://www.technology-mania.com/2012/05/s3-instead-of-hdfs-with-hadoop_05.html
>
> http://www.technology-mania.com/2011/05/s3-as-input-or-output-for-hadoop-mr.html
>
>
>
> On Tue, Oct 16, 2012 at 1:30 PM, sudha sadhasivam <
> [EMAIL PROTECTED]> wrote:
>
>> Is there a time dealy to fetch information from S3 to hadoop cluster when
>> compared to a regular hadoop cluster setup. Can an elastic block storage be
>> used for this purpose?
>> G Sudha
>>
>> --- On *Tue, 10/16/12, Hemanth Yamijala <[EMAIL PROTECTED]>*wrote:
>>
>>
>> From: Hemanth Yamijala <[EMAIL PROTECTED]>
>> Subject: Re: problem using s3 instead of hdfs
>> To: [EMAIL PROTECTED]
>> Date: Tuesday, October 16, 2012, 12:41 PM
>>
>>
>> Hi,
>>
>> I've not tried this on S3. However, the directory mentioned in the
>> exception is based on the value of this particular configuration
>> key: mapreduce.jobtracker.staging.root.dir. This defaults
>> to ${hadoop.tmp.dir}/mapred/staging. Can you please set this to an S3
>> location and try ?
>>
>> Thanks
>> Hemanth
>>
>> On Mon, Oct 15, 2012 at 10:43 PM, Parth Savani <[EMAIL PROTECTED]<http://mc/compose?[EMAIL PROTECTED]>
>> > wrote:
>>
>> Hello,
>>       I am trying to run hadoop on s3 using distributed mode. However I
>> am having issues running my job successfully on it. I get the following
>> error
>> I followed the instructions provided in this article ->
>> http://wiki.apache.org/hadoop/AmazonS3
>> I replaced the fs.default.name value in my hdfs-site.xml to
>> s3n://ID:SECRET@BUCKET
>> And I am running my job using the following: hadoop jar
>> /path/to/my/jar/abcd.jar /input /output
>> Where */input* is the folder name inside the s3 bucket
>> (s3n://ID:SECRET@BUCKET/input)
>> and */output *folder should created in my bucket (s3n://ID:SECRET@BUCKET
>> /output)
>> Below is the error i get. It is looking for job.jar on s3 and that path
>> is on my server from where i am launching my job.
>>
>> java.io.FileNotFoundException: No such file or directory
>> '/opt/data/hadoop/hadoop-mapred/mapred/staging/psavani/.staging/job_201207021606_1036/job.jar'
>> at
>> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:412)
>>  at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:207)
>> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:157)
>>  at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1371)
>> at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1352)
>>  at
>> org.apache.hadoop.mapred.JobLocalizer.localizeJobJarFile(JobLocalizer.java:273)
>> at
>> org.apache.hadoop.mapred.JobLocalizer.localizeJobFiles(JobLocalizer.java:381)
>>  at
>> org.apache.hadoop.mapred.JobLocalizer.localizeJobFiles(JobLocalizer.java:371)
>> at
>> org.apache.hadoop.mapred.DefaultTaskController.initializeJob(DefaultTaskController.java:222)
>>  at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1372)
>> at java.security.AccessController.doPri
>>
>>
>>
>>
>>
>>
>
>
> --
> *Regards*,
> Rahul Patodi
>
>
>
+
Parth Savani 2012-10-16, 14:32
+
Hemanth Yamijala 2012-10-16, 15:10
+
Parth Savani 2012-10-16, 14:34