-Re: HBase 0.90.0 region servers dying
Enis Soztutar 2011-02-19, 08:58
Yes indeed but no luck.
On Fri, Feb 18, 2011 at 11:50 AM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:
> Just to make sure, you did check in the .out file after a failure right?
> On Thu, Feb 17, 2011 at 10:14 PM, Enis Soztutar
> <[EMAIL PROTECTED]> wrote:
> > Hi,
> > Thanks everyone for the answers.
> > I had already increase the file descriptors to 32768. The region servers
> > and the zookeeper processes are dying, but datanode and tasktrackers keep
> > running (they are configured with a max heap of 1Gb). The logs do not
> > contain any indication that something is going wrong. The last info on
> > logs are typical INFO level logs. I have also checked for kernel logs,
> > kernel does not report that it is killing the processes either. While
> > testing, two of the servers restarted at different times, which was the
> > original reason that I had suspected a memory error. But after we
> > the power supplies, nodes did not restart, but the processes kept dying.
> > For the load, the ycsb test for 10M records goes on for a while at 4K
> > inserts per sec, but cannot complete due to region servers dying one by
> > iostat also shows light cpu and io utilization around 20%. Any more
> > suggestions for debugging would be more than welcome.
> > Thanks,
> > Enis
> > On Wed, Feb 16, 2011 at 5:13 AM, Eric <[EMAIL PROTECTED]> wrote:
> >> Did you increase the max open files on your system (in
> >> /etc/security/limits.conf) ?
> >> 2011/2/16 Enis Soztutar <[EMAIL PROTECTED]>
> >> > Hi,
> >> >
> >> > We have a newly setup a cluster of 5 nodes, each with 16 GB rams. We
> >> > HBase 0.90.0 on top of Hadoop from CDH3. When testing HBase under
> >> > load
> >> > generated bu YCSB, we consistently see region servers dying silently,
> >> > without any logs or exceptions (not even in system logs). We couldn't
> >> track
> >> > down the problem, so we have tested the same setup on a rackspace
> >> cluster
> >> > with 7 nodes but similar hardware, and we didn't have any problem.
> >> >
> >> > We are suspecting a problem with the rams, or motherboards, but all
> >> memory
> >> > tests run successfully. I was wondering if anyone had similar problems
> >> > before and is there anything you suggest to nail down the issue.
> >> >
> >> > Thanks,
> >> > Enis
> >> >