Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Re: HBase dies after some time

Copy link to this message
Re: HBase dies after some time
Sorry for the late post to this thread...

To add what Harsh has to say...


Sorry for shouting but that's a core design rule that shouldn't be broken at all costs.

You would be more stable running one ZK on a control node than you would be running them on the TT/DN nodes.
While a little swap won't kill a Hadoop cluster running just M/R, add HBase and swapping becomes fatal.  This is the core problem w Christian's machine.

Because you can run Hadoop on everything from a VM, single machine to a cluster of 1000+ machines, hardware design is often overlooked and with each major hardware vendor creating their own reference architecture, it gets confusing and you may end up spending $$$ on resources you can't fully take advantage of.
On May 30, 2012, at 2:33 AM, Harsh J wrote:

> You may colocate your ZK with the HBase Master as its not very heavy.
> Depending on your cluster size, 1-3 may be enough and you can divide
> it among HBM, SNN and perhaps NN/JT machines.
> On Wed, May 30, 2012 at 2:54 AM, Something Something
> <[EMAIL PROTECTED]> wrote:
>> Hmm.. due to budget constraints, I am forced to install ZooKeeper on the
>> same machine that runs TaskTracker.  When a big MR job starts it fires up
>> over 40 tasks, so as you implied this could definitely be related to memory.
>> Should ZooKeepers be started on their own machines?  Right now I have
>> ZooKeeper, HRegionServer & TaskTracker running on the same machine.  This
>> is a bad idea, right?  Is there any way to get ZooKeeper working under
>> these restrictions?
>> By the way, the ZooKeeper log shows this:
>> 2012-05-29 13:56:54,842 - ERROR [CommitProcessor:2:NIOServerCnxn@445] -
>> Unexpected Exception:
>> java.nio.channels.CancelledKeyException
>>        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
>>        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
>>        at
>> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418)
>>        at
>> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509)
>>        at
>> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367)
>>        at
>> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
>> On Sat, May 26, 2012 at 2:28 AM, Christian Schäfer
>> <[EMAIL PROTECTED]>wrote:
>>> Hi,
>>>  I got exactly the same behaviour and exceptions that you mention on a
>>> local cluster.
>>> In my case the sum of all services' heapspace was higher than the actual
>>> memory of the machine.
>>> At
>>>  first sum the heapspaces of your master machine likely running
>>> NameNode, HMaster, ZooKeeper, and maybe also, RegionServer and DataNode
>>> Then check that this sum is lesser than your master machines memory.
>>> Good Luck.
>>> Chris
>>>        Von: Something Something <[EMAIL PROTECTED]>
>>>  An:
>>>  Gesendet: 3:22 Samstag, 26.Mai 2012
>>>  Betreff: HBase dies after some time
>>> Hello,
>>> I recently installed ZooKeeper & HBase on our dedicated Hadoop cluster on
>>> EC2.  The HBase stays active for some time, but after a while it dies with
>>> error messages similar to these:
>>> 2012-05-25 12:09:27,514 ERROR
>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher:
>>> master:60000-0x5378489312c0004-0x5378489312c0004 Received unexpected
>>> KeeperException, re-throwing exception
>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>> KeeperErrorCode = ConnectionLoss for /hbase/master
>>>        at
>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>>>  at
>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>>        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
>>>        at
>>> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549)