|
|
-
Re: Hadoop On ClusterJeff Hammerbacher 2009-09-23, 19:51
Hey Brian,
Having tried and failed to use NFS to store shared resources for a large Hadoop cluster, I feel the need to say: you may want to reconsider that strategy as your cluster grows. NFS mounts can be quite flaky at scale, as Ted mentions. As Allen mentions, the Distributed Cache is intended to allow access to shared resources on the cluster; see http://hadoop.apache.org/common/docs/r0.20.1/mapred_tutorial.html#DistributedCachefor more information. Later, Jeff On Wed, Sep 23, 2009 at 10:19 AM, Allen Wittenauer <[EMAIL PROTECTED] > wrote: > > > > On 9/23/09 10:09 AM, "Brian Vargas" <[EMAIL PROTECTED]> wrote: > > > Although it can be quite useful to store small shared resources on an > > NFS mount. For example, I find it easier to store various scripts > > called by a streaming job on NFS rather than distributing them from the > > command-line. > > > > Of course, then you have to be sure they don't change out from under the > > running jobs. Tradeoffs. :-) > > You should probably look into distributed cache archives. This eliminates > the NFS bottleneck, avoids the 'magically changing file' problem, and > allows > you to use different versions with different job submissions such that you > can test changes on the fly without having to redeploy. > > |