Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Hadoop On Cluster


Copy link to this message
-
Re: Hadoop On Cluster
Jeff Hammerbacher 2009-09-23, 19:51
Hey Brian,

Having tried and failed to use NFS to store shared resources for a large
Hadoop cluster, I feel the need to say: you may want to reconsider that
strategy as your cluster grows. NFS mounts can be quite flaky at scale, as
Ted mentions. As Allen mentions, the Distributed Cache is intended to allow
access to shared resources on the cluster; see
http://hadoop.apache.org/common/docs/r0.20.1/mapred_tutorial.html#DistributedCachefor
more information.

Later,
Jeff

On Wed, Sep 23, 2009 at 10:19 AM, Allen Wittenauer <[EMAIL PROTECTED]
> wrote:

>
>
>
> On 9/23/09 10:09 AM, "Brian Vargas" <[EMAIL PROTECTED]> wrote:
>
> > Although it can be quite useful to store small shared resources on an
> > NFS mount.  For example, I find it easier to store various scripts
> > called by a streaming job on NFS rather than distributing them from the
> > command-line.
> >
> > Of course, then you have to be sure they don't change out from under the
> > running jobs.  Tradeoffs.  :-)
>
> You should probably look into distributed cache archives.  This eliminates
> the NFS bottleneck, avoids the 'magically changing file' problem, and
> allows
> you to use different versions with different job submissions such that you
> can test changes on the fly without having to redeploy.
>
>