Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase and Consistency in CAP


Copy link to this message
-
Re: HBase and Consistency in CAP
Thanks for the overview. It's helpful. Can you also help me understand
why 2 region servers for the same row keys can't be running on the
nodes where blocks are being replicated? I am assuming all the
logs/HFiles etc are already being replicated so if one region server
fails other region server is still taking reads/writes.

On Fri, Dec 2, 2011 at 12:15 PM, Ian Varley <[EMAIL PROTECTED]> wrote:
> Mohit,
>
> Yeah, those are great places to go and learn.
>
> To fill in a bit more on this topic: "partition-tolerance" usually refers to the idea that you could have a complete disconnection between N sets of machines in your data center, but still be taking writes and serving reads from all the servers. Some "NoSQL" databases can do this (to a degree), but HBase cannot; the master and ZK quorum must be accessible from any machine that's up and running the cluster.
>
> Individual machines can go down, as J-D said, and the master will reassign those regions to another region server. So, imagine you had a network switch fail that disconnected 10 machines in a 20-machine cluster; you wouldn't have 2 baby 10-machine clusters, like you might with some other software; you'd just have 10 machines "down" (and probably a significant interruption while the master replays logs on the remaining 10). That would also require that the underlying HDFS cluster (assuming it's on the same machines) was keeping replicas of the blocks on different racks (which it does by default), otherwise there's no hope.
>
> HBase makes this trade-off intentionally, because in real-world scenarios, there aren't too many cases where a true network partition would be survived by the rest of your stack, either (e.g. imagine a case where application servers can't access a relational database server because of a partition; you're just down). The focus of HBase fault tolerance is recovering from isolated machine failures, not the collapse of your infrastructure.
>
> Ian
>
>
> On Dec 2, 2011, at 2:03 PM, Jean-Daniel Cryans wrote:
>
> Get the HBase book:
> http://www.amazon.com/HBase-Definitive-Guide-Lars-George/dp/1449396100
>
> And/Or read the Bigtable paper.
>
> J-D
>
> On Fri, Dec 2, 2011 at 12:01 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote:
> Where can I read more on this specific subject?
>
> Based on your answer I have more questions, but I want to read more
> specific information about how it works and why it's designed that
> way.
>
> On Fri, Dec 2, 2011 at 11:59 AM, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote:
> No, data is only served by one region server (even if it resides on
> multiple data nodes). If it dies, clients need to wait for the log
> replay and region reassignment.
>
> J-D
>
> On Fri, Dec 2, 2011 at 11:57 AM, Mohit Anchlia <[EMAIL PROTECTED]> wrote:
> Why is HBase consisdered high in consistency and that it gives up
> parition tolerance? My understanding is that failure of one data node
> still doesn't impact client as they would re-adjust the list of
> available data nodes.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB