-RE: Smallest production HBase cluster
Vincent Barat 2010-07-23, 07:57
We run a similar 3 nodes cluster on AWS large instances (8GB ram). We do constant small writes into hbase (for logging) and constantly run m/r jobs at the same time on the same nodes (using pig). Once our regionservers have enough ram (-Xmx2048m in our case) they stay stable. This cluster has not failed since 6 months now, but hbase is not heavy loaded (only constant writes and sequential reads).
Actually, because pig 's hbase loader is too slow, we first copy all logs into regular hdfs files before running m/r jobs. This greatly reduce the load on hbase. This also could allow us to separate the storage cluster from the m/r cluster, which can be a good idea provided that they don't scale the same way, but is a bad idea if your data are hudge.
Finally, even if I like the product a lot, I must say that hbase is THE MOST UNSTABLE PIECE of our backend. We never had any trouble with hdfs, m/r or pig, but we had LOTS of difficulties managing and tuning HBase the right way: there is definitively some work to do on reducing memory usage and increasing fiability.
We lost all of our data once because of a crash that lead to inconsistant data structure, but it was with hbase 0.20.2.
My position is that if hbase could be used on small nodes (2gb ram) reliably it would be the perfect product :-)
Geoff Hendrey <[EMAIL PROTECTED]> a écrit :
>I am running a 3 node cluster. HDFS datanode and Hbase regionserver are
>running on each node. The Hbase master and HDFS namenode run on
>different machines (not "different" in the sense of "not in the
>cluster". Just different in the sense of "not on the same box in the
>cluster").Quad core, 64-bit JVM, 32 GB RAM. 4 disk per machine. We had
>many troubles getting the cluster to stay alive when paired with an
>asymmetric (big) mapreduce cluster that was writing into Hbase.
>Ultimately, we achieved stability by disabling the WAL from code in our
>mapreduce jobs, and setting the Hfile block size lower than the default
>(we do a lot of random reads in the map phase). There are other tweaks
>that must be made, such as upping the OS file limit. I made a lot of
>posts in May, so you could look in the archive. At present, we're quite
>happy with the cluster.
>From: Paul Smith [mailto:[EMAIL PROTECTED]]
>Sent: Thursday, July 22, 2010 3:56 PM
>To: [EMAIL PROTECTED]
>Subject: Smallest production HBase cluster
>anyone able to share their experience, thoughts on the 'smallest'
>production HBase cluster in operation? Thinking there may be some
>point in the # Nodes scale where one transitions from/to "that's silly"
>to "that's actually more like it".
>Anyone out there with a small HBase cluster in operation with < 10 nodes
>able to share any information?
>I notice on http://wiki.apache.org/hadoop/Hbase/PoweredBy there are some
>who have even just a 3 node cluster, perhaps that's out of date, but
>curious to know from the community on where people think 'the line'
>needs to be drawn on usage of Hbase.
>To take things to an extreme, is there anyone actually running a
>_single_ HBase node... ? (one would hope that machine is actually
>designed to be a bit more HA than normal) just to take advantage of a