Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Can we declare some HDFS nodes "primary"


Copy link to this message
-
Re: Can we declare some HDFS nodes "primary"
It seems like those 10 nodes out of 500 would be a hot spot for
writes, if there was a hard requirement that they accept all writes.
This might be acceptable on a very read-focused workload, but are you
sure that's what you've got?

Another consideration is that if Amazon goes down, my understanding is
that all instance storage will be toast.  I think they guarantee that
things stored on EBS, S3, or glacier are durable; instance storage,
not so much.  Power outages do happen after all!  On the other hand, I
am not an expert about Amazon's offerings-- maybe someone else can
clarify the exact guarantees they provide.

You could consider bumping up the replication factor to something
above 3, and relying on the improbability of a 5x (or whatever)
instance failure.  You might also consider periodically rsyncing the
data to s3.

Colin
On Tue, Dec 11, 2012 at 3:39 AM, David Parks <[EMAIL PROTECTED]> wrote:
> Assume for a moment that you have a large cluster of 500 AWS spot instance
> servers running. And you want to keep the bid price low, so at some point
> it’s likely that the whole cluster will get axed until the spot price comes
> down some.
>
>
>
> In order to maintain HDFS continuity I’d want say 10 servers running as
> normal instances, and I’d want to ensure that HDFS is replicating 100% of
> data to those 10 that don’t run the risk of group elimination.
>
>
>
> Is it possible for HDFS to ensure replication to these “primary” nodes?
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB