Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Region Servers Crashing during Random Reads


Copy link to this message
-
RE: Region Servers Crashing during Random Reads
Jonathan Gray 2011-02-03, 20:13
How much heap are you running on your RegionServers?

6GB of total RAM is on the low end.  For high throughput applications, I would recommend at least 6-8GB of heap (so 8+ GB of RAM).

> -----Original Message-----
> From: charan kumar [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, February 03, 2011 11:47 AM
> To: [EMAIL PROTECTED]
> Subject: Region Servers Crashing during Random Reads
>
> Hello,
>
>  I am using hbase 0.90.0 with hadoop-append. h/w ( Dell 1950, 2 CPU, 6 GB
> RAM)
>
> I had 9 Region Servers crash (out of 30) in a span of 30 minutes during a heavy
> reads. It looks like a GC, ZooKeeper Connection Timeout thingy to me.
> I did all recommended configuration from the Hbase wiki... Any other
> suggestions?
>
>
> 2011-02-03T09:43:07.890-0800: 70693.632: [GC 70693.632: [ParNew
> (promotion
> failed): 5555K->5540K(5568K), 0.0280950 secs]70693.660:
> [CMS2011-02-03T09:43:16.864-0800: 70702.606: [CMS-concurrent-mark:
> 12.549/69.323 secs] [Times: user=11.90 sys=1.26, real=69.31 secs]
>
> 2011-02-03T09:53:35.165-0800: 71320.785: [GC 71320.785: [ParNew
> (promotion
> failed): 5568K->5568K(5568K), 0.4384530 secs]71321.224:
> [CMS2011-02-03T09:53:45.111-0800: 71330.731: [CMS-concurrent-mark:
> 17.511/51.564 secs] [Times: user=38.72 sys=5.67, real=51.60 secs]
>
> 2011-02-03T09:43:07.890-0800: 70693.632: [GC 70693.632: [ParNew
> (promotion
> failed): 5555K->5540K(5568K), 0.0280950 secs]70693.660:
> [CMS2011-02-03T09:43:16.864-0800: 70702.606: [CMS-concurrent-mark:
> 12.549/69.323 secs] [Times: user=11.90 sys=1.26, real=69.31 secs]
>
>
> The following is the log entry in region Server
>
> 2011-02-03 10:37:43,946 INFO org.apache.zookeeper.ClientCnxn: Client
> session timed out, have not heard from server in 47172ms for sessionid
> 0x12db9f722421ce3, closing socket connection and attempting reconnect
> 2011-02-03 10:37:43,947 INFO org.apache.zookeeper.ClientCnxn: Client
> session timed out, have not heard from server in 48159ms for sessionid
> 0x22db9f722501d93, closing socket connection and attempting reconnect
> 2011-02-03 10:37:44,401 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server XXXXXXXXXXXXXXXX
> 2011-02-03 10:37:44,402 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to XXXXXXXXX, initiating session
> 2011-02-03 10:37:44,709 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server XXXXXXXXXXXXXXX
> 2011-02-03 10:37:44,709 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to XXXXXXXXXXXXXXXXXXXXX, initiating session
> 2011-02-03 10:37:44,767 DEBUG
> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
> started; Attempting to free 81.93 MB of total=696.25 MB
> 2011-02-03 10:37:44,784 DEBUG
> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
> completed; freed=81.94 MB, total=614.81 MB, single=379.98 MB,
> multi=309.77 MB, memory=0 KB
> 2011-02-03 10:37:45,205 INFO org.apache.zookeeper.ClientCnxn: Unable to
> reconnect to ZooKeeper service, session 0x22db9f722501d93 has expired,
> closing socket connection
> 2011-02-03 10:37:45,206 INFO
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplem
> entation:
> This client just lost it's session with ZooKeeper, trying to reconnect.
> 2011-02-03 10:37:45,453 INFO
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplem
> entation:
> Trying to reconnect to zookeeper
> 2011-02-03 10:37:45,206 INFO org.apache.zookeeper.ClientCnxn: Unable to
> reconnect to ZooKeeper service, session 0x12db9f722421ce3 has expired,
> closing socket connection
> gionserver:60020-0x22db9f722501d93 regionserver:60020-
> 0x22db9f722501d93
> received expired from ZooKeeper, aborting
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired
>         at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(
> ZooKeeperWatcher.java:328)
>         at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeep