|
|
-
Can we declare some HDFS nodes "primary"
David Parks 2012-12-11, 11:39
Assume for a moment that you have a large cluster of 500 AWS spot instance servers running. And you want to keep the bid price low, so at some point it's likely that the whole cluster will get axed until the spot price comes down some.
In order to maintain HDFS continuity I'd want say 10 servers running as normal instances, and I'd want to ensure that HDFS is replicating 100% of data to those 10 that don't run the risk of group elimination.
Is it possible for HDFS to ensure replication to these "primary" nodes?
-
Re: Can we declare some HDFS nodes "primary"
Harsh J 2012-12-11, 13:33
Rack awareness with replication factor of 3 on files will help.
You could declare two racks, one carrying these 10 nodes, and default rack for the rest of them, and the rack-aware default block placement policy will take care of the rest. On Dec 11, 2012 5:10 PM, "David Parks" <[EMAIL PROTECTED]> wrote:
> Assume for a moment that you have a large cluster of 500 AWS *spot > instance* servers running. And you want to keep the bid price low, so at > some point it’s likely that the whole cluster will get axed until the spot > price comes down some.**** > > ** ** > > In order to maintain HDFS continuity I’d want say 10 servers running as > normal instances, and I’d want to ensure that HDFS is replicating 100% of > data to those 10 that don’t run the risk of group elimination.**** > > ** ** > > Is it possible for HDFS to ensure replication to these “primary” nodes?*** > * > > ** ** >
-
Re: Can we declare some HDFS nodes "primary"
Colin McCabe 2012-12-11, 23:53
It seems like those 10 nodes out of 500 would be a hot spot for writes, if there was a hard requirement that they accept all writes. This might be acceptable on a very read-focused workload, but are you sure that's what you've got?
Another consideration is that if Amazon goes down, my understanding is that all instance storage will be toast. I think they guarantee that things stored on EBS, S3, or glacier are durable; instance storage, not so much. Power outages do happen after all! On the other hand, I am not an expert about Amazon's offerings-- maybe someone else can clarify the exact guarantees they provide.
You could consider bumping up the replication factor to something above 3, and relying on the improbability of a 5x (or whatever) instance failure. You might also consider periodically rsyncing the data to s3.
Colin On Tue, Dec 11, 2012 at 3:39 AM, David Parks <[EMAIL PROTECTED]> wrote: > Assume for a moment that you have a large cluster of 500 AWS spot instance > servers running. And you want to keep the bid price low, so at some point > it’s likely that the whole cluster will get axed until the spot price comes > down some. > > > > In order to maintain HDFS continuity I’d want say 10 servers running as > normal instances, and I’d want to ensure that HDFS is replicating 100% of > data to those 10 that don’t run the risk of group elimination. > > > > Is it possible for HDFS to ensure replication to these “primary” nodes? > >
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext