|
|
-
Re: Can we declare some HDFS nodes "primary"Colin McCabe 2012-12-11, 23:53
It seems like those 10 nodes out of 500 would be a hot spot for
writes, if there was a hard requirement that they accept all writes. This might be acceptable on a very read-focused workload, but are you sure that's what you've got? Another consideration is that if Amazon goes down, my understanding is that all instance storage will be toast. I think they guarantee that things stored on EBS, S3, or glacier are durable; instance storage, not so much. Power outages do happen after all! On the other hand, I am not an expert about Amazon's offerings-- maybe someone else can clarify the exact guarantees they provide. You could consider bumping up the replication factor to something above 3, and relying on the improbability of a 5x (or whatever) instance failure. You might also consider periodically rsyncing the data to s3. Colin On Tue, Dec 11, 2012 at 3:39 AM, David Parks <[EMAIL PROTECTED]> wrote: > Assume for a moment that you have a large cluster of 500 AWS spot instance > servers running. And you want to keep the bid price low, so at some point > it’s likely that the whole cluster will get axed until the spot price comes > down some. > > > > In order to maintain HDFS continuity I’d want say 10 servers running as > normal instances, and I’d want to ensure that HDFS is replicating 100% of > data to those 10 that don’t run the risk of group elimination. > > > > Is it possible for HDFS to ensure replication to these “primary” nodes? > > |