-Re: managing 5-10 servers
S Ahmed 2010-11-24, 14:22
So you have 20 nodes for the stumbled upon link redirection service?
Are there any blog posts that go over the setup and what sort of read/write
traffic it gets? Is there a memcached layer that sites in front?
On Tue, Nov 23, 2010 at 4:44 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:
> I wish I could do a dump of my memory into an ops guide to HBase, but
> currently I don't think there's such a writeup.
> What can go wrong... again it depends on your type of usage. With a
> MR-heavy cluster, it's usually very easy to drive the IO wait through
> the roof and then you'll end up with GC pauses >60 secs caused by CPU
> starvation. Here's a recent example we got when a big Mahout job was
> 2010-11-19T18:25:31.173-0800: [GC [ParNew: 114456K->13056K(118016K),
> 103.8190010 secs] 4624541K->4535473K(7154944K), 104.7165690 secs]
> [Times: user=4.45 sys=2.02, real=104.72 secs]
> The trained eye will quickly see that something very bad happened on
> that cluster. Indeed, during post-mortem we saw that somehow that
> machine started swapping which is the Worst Thing Ever (tm) that can
> happen to a machine that runs java processes. Make sure that your
> memory usage always stay under your total memory, even when all the
> mappers and reducers are using their heap at the fullest. And then
> double check that (which it seems we didn't do).
> On a cluster that serves web traffic, and thus must not be MRed
> against, you get the "usual" stuff like bad disks and operator errors.
> On Tue, Nov 23, 2010 at 1:31 PM, S Ahmed <[EMAIL PROTECTED]> wrote:
> > Are there any writeups on what things to look for?
> > What are some of the things that usually go wrong? Or is that an unfair
> > question :)
> > On Tue, Nov 23, 2010 at 4:22 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]
> >> Constant hand holding no, constant monitoring yes. Do setup Ganglia
> >> and preferably Nagios. Then it depends what you're planning to do with
> >> your cluster... here we have 2x 20 machines in production, the one
> >> that serves live traffic is pretty much doing it's own thing by itself
> >> (although I keep a ganglia tab opened on a second monitor) and the
> >> other one is used strictly for MapReduce for which our internal users
> >> have developed a habit of running very destructive jobs on. But to be
> >> fair, it's probably the users that need support the most ;)
> >> J-D
> >> On Tue, Nov 23, 2010 at 1:14 PM, S Ahmed <[EMAIL PROTECTED]> wrote:
> >> > Hi,
> >> >
> >> > How much of a guru do you have to be to keep say 5-10 servers humming?
> >> >
> >> > I'm a 1-man shop, and I dream of developing a web application, and
> >> scaling
> >> > will be a core part of the application.
> >> >
> >> > Is it feasable for a 1-man operation to manage a 5-10 server hbase
> >> cluster?
> >> > Is it something that requires hand holding and constant monitoring or
> >> > tends to be hands off?
> >> >