Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Re: Cloudera Vs Hortonworks Vs MapR

Copy link to this message
Re: Cloudera Vs Hortonworks Vs MapR
Chris Embree 2013-09-17, 01:17
Our evaluation was similar except we did not consider the "management"
tools any vendor provided as that's just as much lock in as any proprietary
tool.  What if I want trade vendors?  I have to re-tool to use there mgmt?
 Nope, wrote our own.

Being in a large enterprise, we went with the "perceived" more stable
platform.  Draw your own conclusions.
On Mon, Sep 16, 2013 at 6:10 PM, Xuri Nagarin <[EMAIL PROTECTED]> wrote:

> So I will try to answer the OP's question best I can without deviating too
> much into opinions and stick to facts. Disclaimer: I am not an employee of
> either vendor or any partner of theirs.
> Context is important: My team's use case was general data exploration of
> semi-structured log data and we had no typical data-warehouse type of
> existing use cases. Also, our's is a small (less than 30 nodes cluster). In
> terms of ops/maintenance, we only have one person. I point this out because
> lots of hadoop shops have dedicated team for each - OS administration,
> Hadoop admin, Hadoop developers. And, they are very mature in terms of
> their compute use cases. To my mind, these aspects can significantly impact
> your vendor choices.
> MapR: My team simply did not consider them because of all the proprietary
> code in there. We are trying to move from a monolithic proprietary product
> and one of the criteria we set was - if we decided to move away from the
> chosen hadoop vendor, can we easily unlock our data?
> HortonWorks: Distro uses HDFS 1.x with MRv2. All open source. Cluster
> management is via Ambari. Compared to Cloudera's CM, Ambari has very
> rudimentary features. But you have to keep in mind that Ambari is only an
> year old where as CM already has been under development for several years.
> This was a major selection factor for us because Ambari did not have all
> the automation/feature-set compared to CM for a single
> administrator/developer to easily maintain the cluster. Also, during the
> trial period, Hortonwork's packing format/structure apparently kept
> changing which made things a bit difficult to centrally deploy/administer.
> Cloudera: Distro uses HDFS 2.x with MRv1. All open source except cluster
> management which is via their proprietary Cloudera Manager tool. It is free
> for use without certain feature like auditing and cluster replication
> features. Maybe a few more features are restricted to
> Enterprise/Licensed-only version. Offers much more features than Ambari. In
> terms of cluster administration, I found CM much easy to work with than
> Ambari. Pretty much all aspects from deploying new nodes to configuration
> and troubleshooting is much more refined than Ambari.
> During the selection process, what I found was that both vendors are very
> aggressive in their pitch. So much so that each pushes some FUD regarding
> the competition.
> HW uses HDFS 1.x + MRv2 while CDH uses HDFS 2.x + MRv1. HW claimed that
> Cloudera's distro is heavily patched off-course from the core Apache trunk
> that can cause severe data corruption issues. Yes, Cloudera has some 1500+
> patches over apache's Hadoop distro but (1) they aren't private patches.
> You can pull the list and verify that yourself just as I did. (2) In our
> testing and talking to other Cloudera customers, I couldn't find any issues
> with data corruption. It is true though that HDFS 2.x is still in beta but
> so is MRv2 that HW uses. I think both are stable and work well - depending
> on what you need but each uses that point to create FUD.
> HW also claimed that a new SQL engine that Cloudera's including in their
> distro - Impala is proprietary. Not true. The software is open source. But
> if you want support for Impala then Cloudera will charge you separately per
> node for Impala over and above what they charge per node for Hadoop support.
> In my experience, both products have plenty of issues when it comes to
> compute engines - Hive, Pig etc and their cluster management software. HDFS
> seem to be solid in both distros. So I wouldn't call either of them