Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Difference between HDFS and local filesystem


+
Sundeep Kambhampati 2013-01-26, 15:49
Copy link to this message
-
Re: Difference between HDFS and local filesystem
The local filesystem has no sense of being 'distributed'. If you run a
distributed mode of Hadoop over file:// (Local FS), then unless the
file:// points being used itself is distributed (such as an NFS), then
your jobs will fail their tasks on all the nodes the referenced files
cannot be found on.

Essentially, for a distributed operation, MR relies on a distributed
file system and local filesystem is opposite of that.

On Sat, Jan 26, 2013 at 9:19 PM, Sundeep Kambhampati
<[EMAIL PROTECTED]> wrote:
> Hi Users,
> I am kind of new to MapReduce programming I am trying to understand the
> integration between MapReduce and HDFS.
> I could understand MapReduce can use HDFS for data access. But is possible
> not to use HDFS at all and run MapReduce programs?
> HDFS does file replication and partitioning. But if I use the following
> command to run the Example MaxTemperature
>
>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
> file:///usr/local/ncdcinput/sample.txt file:///usr/local/out4
>
> instead of
>
>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
> usr/local/ncdcinput/sample.txt usr/local/out4     ->> this will use hdfs
> file system.
>
> it uses local file system files and writing to local file system when I run
> in pseudo distributed mode. Since it is single node there is no problem of
> non local data.
> What happens in a fully distributed mode. Will the files be copied to other
> machines or will it throw errors? will the files be replicated and will they
> be partitioned for running MapReduce if i use Localfile system?
>
> Can someone please explain.
>
> Regards
> Sundeep
>
>
>
>

--
Harsh J
+
Preethi Vinayak Ponangi 2013-01-26, 16:46