Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: Cloudera Vs Hortonworks Vs MapR


Copy link to this message
-
Re: Cloudera Vs Hortonworks Vs MapR
Our evaluation was similar except we did not consider the "management"
tools any vendor provided as that's just as much lock in as any proprietary
tool.  What if I want trade vendors?  I have to re-tool to use there mgmt?
 Nope, wrote our own.

Being in a large enterprise, we went with the "perceived" more stable
platform.  Draw your own conclusions.
On Mon, Sep 16, 2013 at 6:10 PM, Xuri Nagarin <[EMAIL PROTECTED]> wrote:

> So I will try to answer the OP's question best I can without deviating too
> much into opinions and stick to facts. Disclaimer: I am not an employee of
> either vendor or any partner of theirs.
>
> Context is important: My team's use case was general data exploration of
> semi-structured log data and we had no typical data-warehouse type of
> existing use cases. Also, our's is a small (less than 30 nodes cluster). In
> terms of ops/maintenance, we only have one person. I point this out because
> lots of hadoop shops have dedicated team for each - OS administration,
> Hadoop admin, Hadoop developers. And, they are very mature in terms of
> their compute use cases. To my mind, these aspects can significantly impact
> your vendor choices.
>
> MapR: My team simply did not consider them because of all the proprietary
> code in there. We are trying to move from a monolithic proprietary product
> and one of the criteria we set was - if we decided to move away from the
> chosen hadoop vendor, can we easily unlock our data?
> HortonWorks: Distro uses HDFS 1.x with MRv2. All open source. Cluster
> management is via Ambari. Compared to Cloudera's CM, Ambari has very
> rudimentary features. But you have to keep in mind that Ambari is only an
> year old where as CM already has been under development for several years.
> This was a major selection factor for us because Ambari did not have all
> the automation/feature-set compared to CM for a single
> administrator/developer to easily maintain the cluster. Also, during the
> trial period, Hortonwork's packing format/structure apparently kept
> changing which made things a bit difficult to centrally deploy/administer.
>
> Cloudera: Distro uses HDFS 2.x with MRv1. All open source except cluster
> management which is via their proprietary Cloudera Manager tool. It is free
> for use without certain feature like auditing and cluster replication
> features. Maybe a few more features are restricted to
> Enterprise/Licensed-only version. Offers much more features than Ambari. In
> terms of cluster administration, I found CM much easy to work with than
> Ambari. Pretty much all aspects from deploying new nodes to configuration
> and troubleshooting is much more refined than Ambari.
>
> During the selection process, what I found was that both vendors are very
> aggressive in their pitch. So much so that each pushes some FUD regarding
> the competition.
>
> HW uses HDFS 1.x + MRv2 while CDH uses HDFS 2.x + MRv1. HW claimed that
> Cloudera's distro is heavily patched off-course from the core Apache trunk
> that can cause severe data corruption issues. Yes, Cloudera has some 1500+
> patches over apache's Hadoop distro but (1) they aren't private patches.
> You can pull the list and verify that yourself just as I did. (2) In our
> testing and talking to other Cloudera customers, I couldn't find any issues
> with data corruption. It is true though that HDFS 2.x is still in beta but
> so is MRv2 that HW uses. I think both are stable and work well - depending
> on what you need but each uses that point to create FUD.
>
> HW also claimed that a new SQL engine that Cloudera's including in their
> distro - Impala is proprietary. Not true. The software is open source. But
> if you want support for Impala then Cloudera will charge you separately per
> node for Impala over and above what they charge per node for Hadoop support.
>
> In my experience, both products have plenty of issues when it comes to
> compute engines - Hive, Pig etc and their cluster management software. HDFS
> seem to be solid in both distros. So I wouldn't call either of them
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB