Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Difference between HDFS and local filesystem


Copy link to this message
-
Difference between HDFS and local filesystem
Hi Users,
I am kind of new to MapReduce programming I am trying to understand the
integration between MapReduce and HDFS.
I could understand MapReduce can use HDFS for data access. But is
possible not to use HDFS at all and run MapReduce programs?
HDFS does file replication and partitioning. But if I use the following
command to run the Example MaxTemperature

  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
file:///usr/local/ncdcinput/sample.txt file:///usr/local/out4

instead of

  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
usr/local/ncdcinput/sample.txt usr/local/out4     ->> this will use hdfs
file system.

it uses local file system files and writing to local file system when I
run in pseudo distributed mode. Since it is single node there is no
problem of non local data.
What happens in a fully distributed mode. Will the files be copied to
other machines or will it throw errors? will the files be replicated and
will they be partitioned for running MapReduce if i use Localfile system?

Can someone please explain.

Regards
Sundeep
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB