Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> When running Hadoop in pseudo-distributed mode, what directory should I use for hadoop.tmp.dir?


Copy link to this message
-
When running Hadoop in pseudo-distributed mode, what directory should I use for hadoop.tmp.dir?
By default, Hadoop sets hadoop.tmp.dir to your /tmp folder. This is a
problem, because /tmp gets wiped out by Linux when you reboot, leading to
this lovely error from the JobTracker :

2012-10-05 07:41:13,618 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
...
2012-10-05 07:41:22,636 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
2012-10-05 07:41:22,643 INFO org.apache.hadoop.mapred.JobTracker: problem
cleaning system directory: null
java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on
connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
The only way I've found to fix this is to reformat your name node, which
rebuilds the /tmp/hadoop-root folder, which of course gets wiped out again
when you reboot.

So I went ahead and created a folder called /hadoop_temp and gave all users
read/write access to it. I then set this property in my core-site.xml :

<property>
<name>hadoop.tmp.dir</name>
<value>file:///hadoop_temp</value>
</property>

When I re-formatted my namenode, Hadoop seemed happy, giving me this
message :

12/10/05 07:58:54 INFO common.Storage: Storage directory
file:/hadoop_temp/dfs/name has been successfully formatted.
However, when I looked at /hadoop_temp, I noticed that the folder was
empty. And then when I restarted Hadoop and checked my JobTracker log, I
saw this :

2012-10-05 08:02:41,988 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
...
2012-10-05 08:02:51,010 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
2012-10-05 08:02:51,011 INFO org.apache.hadoop.mapred.JobTracker: problem
cleaning system directory: null
java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on
connection exception: java.net.ConnectException: Connection refused
And when I checked my namenode log, I saw this :

2012-10-05 08:00:31,206 INFO org.apache.hadoop.hdfs.server.common.Storage:
Storage directory /opt/hadoop/hadoop-0.20.2/file:/hadoop_temp/dfs/name does
not exist.
2012-10-05 08:00:31,212 ERROR
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
initialization failed.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException:
Directory /opt/hadoop/hadoop-0.20.2/file:/hadoop_temp/dfs/name is in an
inconsistent state: storage directory does not exist or is not accessible.
So, clearly I didn't configure something right. Hadoop still expects to see
its files in the /tmp folder even though I set hadoop.tmp.dir to
/hadoop_temp in core-site.xml. What did I do wrong? What's the accepted
"right" value for hadoop.tmp.dir?

Bonus question : what should I use for hbase.tmp.dir?

System info :

Ubuntu 12.04, Apache Hadoop .20.2, Apache HBase .92.1

Thanks for taking a look!

--Jeremy
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB