Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> CDH and Hadoop


Copy link to this message
-
RE: CDH and Hadoop

Rita,

It sounds like you're only using Hadoop and have no intentions to really get into the internals.

I'm like most admins/developers/IT guys and I'm pretty lazy.
I find it easier to set up the yum repository and then issue the yum install hadoop command.

The thing about Cloudera is that they do back port patches so that while their release is 'heavily patched'.
But they are usually in some sort of sync with the Apache release. Since you're only working with HDFS and its pretty stable, I'd say go with the Cloudera release.

HTH

-Mike
----------------------------------------
> Date: Wed, 23 Mar 2011 11:12:30 -0400
> Subject: Re: CDH and Hadoop
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> CC: [EMAIL PROTECTED]
>
> Mike,
>
> Thanks. This helps a lot.
>
> At our lab we have close to 60 servers which only run hdfs. I don't need
> mapreduce and other bells and whistles. We just use hdfs for storing dataset
> results ranging from 3gb to 90gb.
>
> So, what is the best practice for hdfs? should I always deploy one version
> before? I understand that Cloudera's version is heavily patched (similar to
> Redhat Linux kernel versus standard Linux kernel).
>
>
>
>
>
>
> On Wed, Mar 23, 2011 at 10:44 AM, Michael Segel
> wrote:
>
> >
> > Rita,
> >
> > Short answer...
> >
> > Cloudera's release is free, and they do also offer a support contract if
> > you want support from them.
> > Cloudera has sources, but most use yum (redhat/centos) to download an
> > already built release.
> >
> > Should you use it?
> > Depends on what you want to do.
> >
> > If your goal is to get up and running with Hadoop and then focus on *using*
> > Hadoop/HBase/Hive/Pig/etc... then it makes sense.
> >
> > If your goal is to do a deep dive in to Hadoop and get your hands dirty
> > mucking around with the latest and greatest in trunk? Then no. You're better
> > off building your own off the official Apache release.
> >
> > Many companies choose Cloudera's release for the following reasons:
> > * Paid support is available.
> > * Companies focus on using a tech not developing the tech, so Cloudera does
> > the heavy lifting while Client Companies focus on 'USING' Hadoop.
> > * Cloudera's release makes sure that the versions in the release work
> > together. That is that when you down load CHD3B4, you get a version of
> > Hadoop that will work with the included version of HBase, Hive, etc ...
> >
> > And no, its never a good idea to try and mix and match Hadoop from
> > different environments and versions in a cluster.
> > (I think it will barf on you.)
> >
> > Does that help?
> >
> > -Mike
> >
> >
> > ----------------------------------------
> > > Date: Wed, 23 Mar 2011 10:29:16 -0400
> > > Subject: CDH and Hadoop
> > > From: [EMAIL PROTECTED]
> > > To: [EMAIL PROTECTED]
> > >
> > > I have been wondering if I should use CDH (
> > http://www.cloudera.com/hadoop/)
> > > instead of the standard Hadoop distribution.
> > >
> > > What do most people use? Is CDH free? do they provide the tars or does it
> > > provide source code and I simply compile? Can I have some data nodes as
> > CDH
> > > and the rest as regular Hadoop?
> > >
> > >
> > > I am asking this because so far I noticed a serious bug (IMO) in the
> > > decommissioning process (
> > >
> > http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201103.mbox/%3cAANLkTikPKGt5zw1QGLse+[EMAIL PROTECTED]%3e
> > > )
> > >
> > >
> > >
> > >
> > > --
> > > --- Get your facts first, then you can distort them as you please.--
> >
> >
>
>
>
> --
> --- Get your facts first, then you can distort them as you please.--