Smith, Joshua D. 2013-09-12, 18:00
Marco Shaw 2013-09-12, 17:51
Aaron Eng 2013-09-12, 18:14
Xuri Nagarin 2013-09-12, 23:39
Chris Mattmann 2013-09-13, 05:50
Chris Embree 2013-09-13, 17:24
Shahab Yunus 2013-09-13, 17:48
Suresh Srinivas 2013-09-13, 18:36
Adam Muise 2013-09-13, 18:01
Xuri Nagarin 2013-09-13, 18:21
Chris Mattmann 2013-09-14, 19:37
Xuri Nagarin 2013-09-16, 22:10
Our evaluation was similar except we did not consider the "management"
tools any vendor provided as that's just as much lock in as any proprietary
tool. What if I want trade vendors? I have to re-tool to use there mgmt?
Nope, wrote our own.
Being in a large enterprise, we went with the "perceived" more stable
platform. Draw your own conclusions.
On Mon, Sep 16, 2013 at 6:10 PM, Xuri Nagarin <[EMAIL PROTECTED]> wrote:
> So I will try to answer the OP's question best I can without deviating too
> much into opinions and stick to facts. Disclaimer: I am not an employee of
> either vendor or any partner of theirs.
> Context is important: My team's use case was general data exploration of
> semi-structured log data and we had no typical data-warehouse type of
> existing use cases. Also, our's is a small (less than 30 nodes cluster). In
> terms of ops/maintenance, we only have one person. I point this out because
> lots of hadoop shops have dedicated team for each - OS administration,
> Hadoop admin, Hadoop developers. And, they are very mature in terms of
> their compute use cases. To my mind, these aspects can significantly impact
> your vendor choices.
> MapR: My team simply did not consider them because of all the proprietary
> code in there. We are trying to move from a monolithic proprietary product
> and one of the criteria we set was - if we decided to move away from the
> chosen hadoop vendor, can we easily unlock our data?
> HortonWorks: Distro uses HDFS 1.x with MRv2. All open source. Cluster
> management is via Ambari. Compared to Cloudera's CM, Ambari has very
> rudimentary features. But you have to keep in mind that Ambari is only an
> year old where as CM already has been under development for several years.
> This was a major selection factor for us because Ambari did not have all
> the automation/feature-set compared to CM for a single
> administrator/developer to easily maintain the cluster. Also, during the
> trial period, Hortonwork's packing format/structure apparently kept
> changing which made things a bit difficult to centrally deploy/administer.
> Cloudera: Distro uses HDFS 2.x with MRv1. All open source except cluster
> management which is via their proprietary Cloudera Manager tool. It is free
> for use without certain feature like auditing and cluster replication
> features. Maybe a few more features are restricted to
> Enterprise/Licensed-only version. Offers much more features than Ambari. In
> terms of cluster administration, I found CM much easy to work with than
> Ambari. Pretty much all aspects from deploying new nodes to configuration
> and troubleshooting is much more refined than Ambari.
> During the selection process, what I found was that both vendors are very
> aggressive in their pitch. So much so that each pushes some FUD regarding
> the competition.
> HW uses HDFS 1.x + MRv2 while CDH uses HDFS 2.x + MRv1. HW claimed that
> Cloudera's distro is heavily patched off-course from the core Apache trunk
> that can cause severe data corruption issues. Yes, Cloudera has some 1500+
> patches over apache's Hadoop distro but (1) they aren't private patches.
> You can pull the list and verify that yourself just as I did. (2) In our
> testing and talking to other Cloudera customers, I couldn't find any issues
> with data corruption. It is true though that HDFS 2.x is still in beta but
> so is MRv2 that HW uses. I think both are stable and work well - depending
> on what you need but each uses that point to create FUD.
> HW also claimed that a new SQL engine that Cloudera's including in their
> distro - Impala is proprietary. Not true. The software is open source. But
> if you want support for Impala then Cloudera will charge you separately per
> node for Impala over and above what they charge per node for Hadoop support.
> In my experience, both products have plenty of issues when it comes to
> compute engines - Hive, Pig etc and their cluster management software. HDFS
> seem to be solid in both distros. So I wouldn't call either of them
M. C. Srivas 2013-09-17, 03:37
Xuri Nagarin 2013-09-18, 18:41
Suresh Srinivas 2013-09-12, 18:09