We'd like to use CMake instead of autotools to build native (C/C++) code in
Hadoop. There are a lot of reasons to want to do this. For one thing, it is
not feasible to use autotools on the Windows platform, because it depends on
UNIX shell scripts, the m4 macro processor, and some other pieces of
infrastructure which are not present on Windows.
For another thing, CMake builds are substantially simpler and faster, because
there is only one layer of generated code. With autotools, you have automake
generating m4 code which autoconf reads, which it uses to generate a UNIX shell
script, which then generates another UNIX shell script, which eventually
generates Makefiles. CMake simply generates Makefiles out of CMakeLists.txt
files-- much simpler to understand and debug, and much faster.
CMake is a lot easier to learn.
automake error messages can be very, very confusing. This is because you are
essentially debugging a pile of shell scripts and macros, rather than a
coherent whole. So you see error messages like "autoreconf: cannot empty
/tmp/ar0.4849: Is a directory" or "Can't locate object method "path" via
package "Autom4te... and so forth. CMake error messages come from the CMake
application and they almost always immediately point you to the problem.
>From a build point of view, the net result of adopting CMake would be that you
would no longer need automake and related programs installed to build the
native parts of Hadoop. Instead, you would need CMake installed. CMake is
packaged by Red Hat, even in RHEL5, so it shouldn't be difficult to install
locally. It's also available for Mac OS X and Windows, as I mentioned earlier.
The JIRA for this work is at https://issues.apache.org/jira/browse/HADOOP-8368
Thanks for reading.