Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> mismatched dfs.data.dir


Copy link to this message
-
mismatched dfs.data.dir
Hey hdfs gurus -

One of my clusters is going through disk upgrades and not all machines have
a homogenous disk layout during the transition period. At first I started
looking into auto-generating dfs.data.dir based on the current machine
profile, but then looked at how disks are actually made available to the
datanode.

Looking at makeInstance() we see each disk listed in dfs.data.dir is tested,
and if usable, added as a disk to use. If there are disks to use a new
datanode is started with the usable disks.

Does it seem reasonable to push a config to all hosts with the new number of
disks? As machines are upgraded and mount points exist the disks will be
used. Machines not yet upgraded will simply ignore the missing directories
(the datanode will not have permissions to create the missing dirs).

 public static DataNode makeInstance(String[] dataDirs, Configuration conf)
   throws IOException {
   ArrayList<File> dirs = new ArrayList<File>();
   for (int i = 0; i < dataDirs.length; i++) {
     File data = new File(dataDirs[i]);
     try {
       DiskChecker.checkDir(data);
       dirs.add(data);
     } catch(DiskErrorException e) {
       LOG.warn("Invalid directory in dfs.data.dir: " + e.getMessage());
     }
   }
   if (dirs.size() > 0)
     return new DataNode(conf, dirs);
   LOG.error("All directories in dfs.data.dir are invalid.");
   return null;
 }

Thoughts?

--travis