I am using hadoop 0.20.2 for data analysis for my company. I did not upgrade
to hadoop 0.21 since the note in
On Wed, Dec 22, 2010 at 7:39 PM, Eric <[EMAIL PROTECTED]> wrote:
> This question may have been asked numerous times, and the answer will
> probably come down to the specific situation you are in, but I'm going to
> ask anyway:
> Which Hadoop version should I pick?
> I'm currently running Cloudera's CDH3 beta release, but I'm very tempted to
> install the latest Apache 0.21 version instead.
> Problems I encountered are:
> * Cloudera's distribution has bugs, like pid file directories that
> disappear after a reboot (because it's a memory disk).
> * I'm writing code against deprecated libraries :-( The new libraries are
> not yet complete in release 0.20.x.
> I'm not (yet) running a production cluster, but I'm planning on turning it
> into a production cluster in a few months. I do not feel confortable writing
> code against deprecated libraries, but I also don't feel confortable
> installing a Hadoop release that is not well tested and declared stable. If
> I am experimenting now so changes are that 0.21 will become stable over the
> coming months and will be a stable release once I go into production.
> If I may ask, what are you running? I can imagine large companies are not
> running the lastest version of Hadoop and/or HBase. Or am I wrong? Are you
> guys patching old releases or are you keeping up with new releases instead?
> Are there advantages to running Cloudera's packages instead of the Apache
> releases (besides that it is slightly easier to install)?
> Thank you in advance. All comments and suggestions are welcome!