-Re: Region servers fall after Zookeeper connectivity loss on EC2
Kevin O'dell 2012-07-02, 19:30
Wasn't there an EC2 outage or am I imagining things?
On Mon, Jul 2, 2012 at 2:30 PM, Norbert Burger <[EMAIL PROTECTED]> wrote:
> From what I understand, the leap second bug could've hit anytime in the 24
> hours before 23:59:59. We had it start happening early afternoon Sat on a
> few of our boxes.
> On Mon, Jul 2, 2012 at 12:58 PM, Kevin O'dell <[EMAIL PROTECTED]>wrote:
>> How recently would you say this is happening? Did this start last Sat
>> around midnight?
>> On Mon, Jul 2, 2012 at 11:50 AM, Nicolas Thiébaud
>> <[EMAIL PROTECTED]> wrote:
>> > Hi,
>> > We have been successfully running a cdh3 HBase cluster on c1.xlarge
>> > instances for over a month, but we recently started hitting what looks
>> > connectivity issues in the clusters. Zookeeper sessions are expired by
>> > zk server and the region servers throw a YouAreDeadException before
>> > crashing.
>> > Could this be imputed to the gc ? Is there anything I can do about it ? I
>> > am monitoring the Ganglia metrics but am unsure of their semantics (where
>> > can I find it?).
>> > I know that running hbase on ec2 is advised against, but we really need
>> > get this working.
>> > Thanks,
>> > Nicolas.
>> > ZooKeeper log: http://pastebin.com/bVjrkRSL
>> > RegionServer log: http://pastebin.com/fU81d8hr
>> Kevin O'Dell
>> Customer Operations Engineer, Cloudera
Customer Operations Engineer, Cloudera