|
Rakhi Khatwani
2009-04-17, 16:39
Rakhi Khatwani
2009-04-17, 16:44
Ted Coyle
2009-04-17, 17:05
Andrew Purtell
2009-04-18, 13:39
Rakhi Khatwani
2009-04-18, 17:12
Tim Hawkins
2009-04-21, 12:38
|
-
Ec2 instabilityRakhi Khatwani 2009-04-17, 16:39
Hi,
Its been several days since we have been trying to stabilize hadoop/hbase on ec2 cluster. but failed to do so. We still come across frequent region server fails, scanner timeout exceptions and OS level deadlocks etc... and 2day while doing a list of tables on hbase i get the following exception: hbase(main):001:0> list 09/04/17 13:57:18 INFO ipc.HBaseClass: Retrying connect to server: / 10.254.234.32:60020. Already tried 0 time(s). 09/04/17 13:57:19 INFO ipc.HBaseClass: Retrying connect to server: / 10.254.234.32:60020. Already tried 1 time(s). 09/04/17 13:57:20 INFO ipc.HBaseClass: Retrying connect to server: / 10.254.234.32:60020. Already tried 2 time(s). 09/04/17 13:57:20 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 not available yet, Zzzzz... 09/04/17 13:57:20 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 could not be reached after 1 tries, giving up. 09/04/17 13:57:21 INFO ipc.HBaseClass: Retrying connect to server: / 10.254.234.32:60020. Already tried 0 time(s). 09/04/17 13:57:22 INFO ipc.HBaseClass: Retrying connect to server: / 10.254.234.32:60020. Already tried 1 time(s). 09/04/17 13:57:23 INFO ipc.HBaseClass: Retrying connect to server: / 10.254.234.32:60020. Already tried 2 time(s). 09/04/17 13:57:23 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 not available yet, Zzzzz... 09/04/17 13:57:23 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 could not be reached after 1 tries, giving up. 09/04/17 13:57:26 INFO ipc.HBaseClass: Retrying connect to server: / 10.254.234.32:60020. Already tried 0 time(s). 09/04/17 13:57:27 INFO ipc.HBaseClass: Retrying connect to server: / 10.254.234.32:60020. Already tried 1 time(s). 09/04/17 13:57:28 INFO ipc.HBaseClass: Retrying connect to server: / 10.254.234.32:60020. Already tried 2 time(s). 09/04/17 13:57:28 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 not available yet, Zzzzz... 09/04/17 13:57:28 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 could not be reached after 1 tries, giving up. 09/04/17 13:57:29 INFO ipc.HBaseClass: Retrying connect to server: / 10.254.234.32:60020. Already tried 0 time(s). 09/04/17 13:57:30 INFO ipc.HBaseClass: Retrying connect to server: / 10.254.234.32:60020. Already tried 1 time(s). 09/04/17 13:57:31 INFO ipc.HBaseClass: Retrying connect to server: / 10.254.234.32:60020. Already tried 2 time(s). 09/04/17 13:57:31 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 not available yet, Zzzzz... 09/04/17 13:57:31 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 could not be reached after 1 tries, giving up. 09/04/17 13:57:34 INFO ipc.HBaseClass: Retrying connect to server: / 10.254.234.32:60020. Already tried 0 time(s). 09/04/17 13:57:35 INFO ipc.HBaseClass: Retrying connect to server: / 10.254.234.32:60020. Already tried 1 time(s). 09/04/17 13:57:36 INFO ipc.HBaseClass: Retrying connect to server: / 10.254.234.32:60020. Already tried 2 time(s). 09/04/17 13:57:36 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 not available yet, Zzzzz... but if i check on the UI, hbase master is still on, (tried refreshing it several times). and i have been getting a lot of exceptions from time to time including region servers going down (which happens very frequently due to which there is heavy data loss... that too on production data), scanner timeout exceptions, cannot allocate memory exceptions etc. I am working on amazon ec2 Large cluster with 6 nodes... with each node having the hardware configuration as follows: - Large Instance 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of instance storage, 64-bit platform I am using hadoop-0.19.0 and hbase 0.19.0 (resynced to all the nodes and made sure that there is a symbolic link to hadoop-site from hbase/conf) Following is my configuration on hadoop-site.xml <configuration> <property> <name>hadoop.tmp.dir</name> <value>/mnt/hadoop</value> </property> <property> <name>fs.default.name</name> <value>hdfs://domU-12-31-39-00-E5-D2.compute-1.internal:50001</value> </property> <property> <name>mapred.job.tracker</name> <value>domU-12-31-39-00-E5-D2.compute-1.internal:50002</value> </property> <property> <name>tasktracker.http.threads</name> <value>80</value> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>3</value> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>3</value> </property> <property> <name>mapred.output.compress</name> <value>true</value> </property> <property> <name>mapred.output.compression.type</name> <value>BLOCK</value> </property> <property> <name>dfs.client.block.write.retries</name> <value>3</value> </property> <property> <name>mapred.child.java.opts</name> <value>-Xmx4096m</value> </property> Given it a high value since the RAM on each node is 7GB... not sure of this setting though **i got Cannot Allocate Memory Exception after making this setting. (got it for the first time) after going through the archives, someone suggested enabling the overcommit memory....not sure of it though ** <property> <name>dfs.datanode.max.xcievers</name> <value>4096</value> </property> As suggested by some of you... i guess it solved the data xceivers exception on hadoop <property> <name>dfs.datanode.handler.count</name> <value>10</value> </property> <property> <name>mapred.task.timeout</name> <value>0</value> <description>The number of milliseconds before a task will be terminated if it neither reads an input, writes an output, nor updates its status string. </description> </property> This property has been set coz i have been getting a lot of exceptions "Cannot report in 602 seconds....killing" <property> <name>mapred.tasktracker.expiry.interval</name> <value>360000</value> <description>Expert: The time-interval, in miliseconds, after which a tasktracker is declared 'lost' if it doesn't send heartbeats. </description> </property> <property> <name>dfs.datanode.
-
Re: Ec2 instabilityRakhi Khatwani 2009-04-17, 16:44
Hi,
this is the exception i have been getting @ the mapreduce java.io.IOException: Cannot run program "bash": java.io.IOException: error=12, Cannot allocate memory at java.lang.ProcessBuilder.start(ProcessBuilder.java:459) at org.apache.hadoop.util.Shell.runCommand(Shell.java:149) at org.apache.hadoop.util.Shell.run(Shell.java:134) at org.apache.hadoop.fs.DF.getAvailable(DF.java:73) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:321) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) at org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:61) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1199) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:857) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333) at org.apache.hadoop.mapred.Child.main(Child.java:155) Caused by: java.io.IOException: java.io.IOException: error=12, Cannot allocate memory at java.lang.UNIXProcess.(UNIXProcess.java:148) at java.lang.ProcessImpl.start(ProcessImpl.java:65) at java.lang.ProcessBuilder.start(ProcessBuilder.java:452) ... 10 more On Fri, Apr 17, 2009 at 10:09 PM, Rakhi Khatwani <[EMAIL PROTECTED]>wrote: > Hi, > Its been several days since we have been trying to stabilize > hadoop/hbase on ec2 cluster. but failed to do so. > We still come across frequent region server fails, scanner timeout > exceptions and OS level deadlocks etc... > > and 2day while doing a list of tables on hbase i get the following > exception: > > hbase(main):001:0> list > 09/04/17 13:57:18 INFO ipc.HBaseClass: Retrying connect to server: / > 10.254.234.32:60020. Already tried 0 time(s). > 09/04/17 13:57:19 INFO ipc.HBaseClass: Retrying connect to server: / > 10.254.234.32:60020. Already tried 1 time(s). > 09/04/17 13:57:20 INFO ipc.HBaseClass: Retrying connect to server: / > 10.254.234.32:60020. Already tried 2 time(s). > 09/04/17 13:57:20 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 not > available yet, Zzzzz... > 09/04/17 13:57:20 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 could > not be reached after 1 tries, giving up. > 09/04/17 13:57:21 INFO ipc.HBaseClass: Retrying connect to server: / > 10.254.234.32:60020. Already tried 0 time(s). > 09/04/17 13:57:22 INFO ipc.HBaseClass: Retrying connect to server: / > 10.254.234.32:60020. Already tried 1 time(s). > 09/04/17 13:57:23 INFO ipc.HBaseClass: Retrying connect to server: / > 10.254.234.32:60020. Already tried 2 time(s). > 09/04/17 13:57:23 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 not > available yet, Zzzzz... > 09/04/17 13:57:23 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 could > not be reached after 1 tries, giving up. > 09/04/17 13:57:26 INFO ipc.HBaseClass: Retrying connect to server: / > 10.254.234.32:60020. Already tried 0 time(s). > 09/04/17 13:57:27 INFO ipc.HBaseClass: Retrying connect to server: / > 10.254.234.32:60020. Already tried 1 time(s). > 09/04/17 13:57:28 INFO ipc.HBaseClass: Retrying connect to server: / > 10.254.234.32:60020. Already tried 2 time(s). > 09/04/17 13:57:28 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 not > available yet, Zzzzz... > 09/04/17 13:57:28 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 could > not be reached after 1 tries, giving up. > 09/04/17 13:57:29 INFO ipc.HBaseClass: Retrying connect to server: / > 10.254.234.32:60020. Already tried 0 time(s). > 09/04/17 13:57:30 INFO ipc.HBaseClass: Retrying connect to server: / > 10.254.234.32:60020. Already tried 1 time(s). > 09/04/17 13:57:31 INFO ipc.HBaseClass: Retrying connect to server: / > 10.254.234.32:60020. Already tried 2 time(s). > 09/04/17 13:57:31 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 not > available yet, Zzzzz... > 09/04/17 13:57:31 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 could > not be reached after 1 tries, giving up. > 09/04/17 13:57:34 INFO ipc.HBaseClass: Retrying connect to server: /
-
RE: Ec2 instabilityTed Coyle 2009-04-17, 17:05
Rakhi,
I'd suggest going to 0.19.1. hbase and hadoop. We had so many problems with .0.19.0 on EC2 that we couldn't use it. Having problems with name resolution and generic startup scripts with .0.19.1 release but not a show stopper. Ted -----Original Message----- From: Rakhi Khatwani [mailto:[EMAIL PROTECTED]] Sent: Friday, April 17, 2009 12:45 PM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: Ec2 instability Hi, this is the exception i have been getting @ the mapreduce java.io.IOException: Cannot run program "bash": java.io.IOException: error=12, Cannot allocate memory at java.lang.ProcessBuilder.start(ProcessBuilder.java:459) at org.apache.hadoop.util.Shell.runCommand(Shell.java:149) at org.apache.hadoop.util.Shell.run(Shell.java:134) at org.apache.hadoop.fs.DF.getAvailable(DF.java:73) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathF orWrite(LocalDirAllocator.java:321) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllo cator.java:124) at org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFi le.java:61) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java :1199) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:857) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333) at org.apache.hadoop.mapred.Child.main(Child.java:155) Caused by: java.io.IOException: java.io.IOException: error=12, Cannot allocate memory at java.lang.UNIXProcess.(UNIXProcess.java:148) at java.lang.ProcessImpl.start(ProcessImpl.java:65) at java.lang.ProcessBuilder.start(ProcessBuilder.java:452) ... 10 more On Fri, Apr 17, 2009 at 10:09 PM, Rakhi Khatwani <[EMAIL PROTECTED]>wrote: > Hi, > Its been several days since we have been trying to stabilize > hadoop/hbase on ec2 cluster. but failed to do so. > We still come across frequent region server fails, scanner timeout > exceptions and OS level deadlocks etc... > > and 2day while doing a list of tables on hbase i get the following > exception: > > hbase(main):001:0> list > 09/04/17 13:57:18 INFO ipc.HBaseClass: Retrying connect to server: / > 10.254.234.32:60020. Already tried 0 time(s). > 09/04/17 13:57:19 INFO ipc.HBaseClass: Retrying connect to server: / > 10.254.234.32:60020. Already tried 1 time(s). > 09/04/17 13:57:20 INFO ipc.HBaseClass: Retrying connect to server: / > 10.254.234.32:60020. Already tried 2 time(s). > 09/04/17 13:57:20 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 not > available yet, Zzzzz... > 09/04/17 13:57:20 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 could > not be reached after 1 tries, giving up. > 09/04/17 13:57:21 INFO ipc.HBaseClass: Retrying connect to server: / > 10.254.234.32:60020. Already tried 0 time(s). > 09/04/17 13:57:22 INFO ipc.HBaseClass: Retrying connect to server: / > 10.254.234.32:60020. Already tried 1 time(s). > 09/04/17 13:57:23 INFO ipc.HBaseClass: Retrying connect to server: / > 10.254.234.32:60020. Already tried 2 time(s). > 09/04/17 13:57:23 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 not > available yet, Zzzzz... > 09/04/17 13:57:23 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 could > not be reached after 1 tries, giving up. > 09/04/17 13:57:26 INFO ipc.HBaseClass: Retrying connect to server: / > 10.254.234.32:60020. Already tried 0 time(s). > 09/04/17 13:57:27 INFO ipc.HBaseClass: Retrying connect to server: / > 10.254.234.32:60020. Already tried 1 time(s). > 09/04/17 13:57:28 INFO ipc.HBaseClass: Retrying connect to server: / > 10.254.234.32:60020. Already tried 2 time(s). > 09/04/17 13:57:28 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 not > available yet, Zzzzz... > 09/04/17 13:57:28 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 could > not be reached after 1 tries, giving up. > 09/04/17 13:57:29 INFO ipc.HBaseClass: Retrying connect to server: / > 10.254.234.32:60020. Already tried 0 time(s). > 09/04/17 13:57:30 INFO ipc.HBaseClass: Retrying connect to server: / not could not it including there cores and hbase/conf) <value>hdfs://domU-12-31-39-00-E5-D2.compute-1.internal:50001</value> this (got it overcommit <value>hdfs://domU-12-31-39-00-E5-D2.compute-1.internal:50001/hbase</val ue> is 3-4 is which the separating consider as _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this message in error, please contact the sender and delete the material from any computer.
-
Re: Ec2 instabilityAndrew Purtell 2009-04-18, 13:39
Hi, This is an OS level exception. Your node is out of memory even to fork a process. How many instances do you currently have allocated? Have you increased the number of instances over time to try and spread the load of your application around? How many concurrent mapper and/or reducer processes do you execute on a node? Can you characterize the memory usage of your mappers and reducers? Are you running other processes external to hadoop/hbase which consume a lot of memory? Are you running Ganglia or similar to track and characterize resource usage over time? You may find you are trying to solve a 100 node problem with 10. - Andy > From: Rakhi Khatwani > Subject: Re: Ec2 instability > To: [EMAIL PROTECTED], [EMAIL PROTECTED] > Date: Friday, April 17, 2009, 9:44 AM > Hi, > this is the exception i have been getting @ the mapreduce > > java.io.IOException: Cannot run program "bash": > java.io.IOException: > error=12, Cannot allocate memory > at java.lang.ProcessBuilder.start(ProcessBuilder.java:459) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:149) > at org.apache.hadoop.util.Shell.run(Shell.java:134) > at org.apache.hadoop.fs.DF.getAvailable(DF.java:73) > at > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:321) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) > at > org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:61) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1199) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:857) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333) > at org.apache.hadoop.mapred.Child.main(Child.java:155) > Caused by: java.io.IOException: java.io.IOException: > error=12, Cannot > allocate memory > at java.lang.UNIXProcess.(UNIXProcess.java:148) > at java.lang.ProcessImpl.start(ProcessImpl.java:65) > at java.lang.ProcessBuilder.start(ProcessBuilder.java:452) > ... 10 more
-
Re: Ec2 instabilityRakhi Khatwani 2009-04-18, 17:12
Hi,
I have 6 instances allocated. i havent tried adding more instances coz i have maximum of 30,000 rows in hbase tables. wht do u recommend? i have max 4-5 map concurrent map/reduce tasks on one node. how do we characterize the memory usage of mappers and reducers?? i m running spinn3r... other than regular hadoop/hbase... but spinn3r is being called from one of my map tasks. I am not running gangila or any other program to characterize resource usage over time. Thanks, Raakhi On Sat, Apr 18, 2009 at 7:09 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > > Hi, > > This is an OS level exception. Your node is out of memory > even to fork a process. > > How many instances do you currently have allocated? Have > you increased the number of instances over time to try and > spread the load of your application around? How many > concurrent mapper and/or reducer processes do you execute > on a node? Can you characterize the memory usage of your > mappers and reducers? Are you running other processes > external to hadoop/hbase which consume a lot of memory? Are > you running Ganglia or similar to track and characterize > resource usage over time? > > You may find you are trying to solve a 100 node problem > with 10. > > - Andy > > > From: Rakhi Khatwani > > Subject: Re: Ec2 instability > > To: [EMAIL PROTECTED], [EMAIL PROTECTED] > > Date: Friday, April 17, 2009, 9:44 AM > > Hi, > > this is the exception i have been getting @ the mapreduce > > > > java.io.IOException: Cannot run program "bash": > > java.io.IOException: > > error=12, Cannot allocate memory > > at java.lang.ProcessBuilder.start(ProcessBuilder.java:459) > > at org.apache.hadoop.util.Shell.runCommand(Shell.java:149) > > at org.apache.hadoop.util.Shell.run(Shell.java:134) > > at org.apache.hadoop.fs.DF.getAvailable(DF.java:73) > > at > > > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:321) > > at > > > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) > > at > > > org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:61) > > at > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1199) > > at > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:857) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333) > > at org.apache.hadoop.mapred.Child.main(Child.java:155) > > Caused by: java.io.IOException: java.io.IOException: > > error=12, Cannot > > allocate memory > > at java.lang.UNIXProcess.(UNIXProcess.java:148) > > at java.lang.ProcessImpl.start(ProcessImpl.java:65) > > at java.lang.ProcessBuilder.start(ProcessBuilder.java:452) > > ... 10 more > > > > >
-
Re: Ec2 instabilityTim Hawkins 2009-04-21, 12:38
I would be interested in understanding what problems you are having,
we are using 19.0 in production on EC2, running nutch and a set of custom apps in a mixed workload on a farm of 5 instances. On 17 Apr 2009, at 18:05, Ted Coyle wrote: > Rakhi, > I'd suggest going to 0.19.1. hbase and hadoop. > > We had so many problems with .0.19.0 on EC2 that we couldn't use it. > Having problems with name resolution and generic startup scripts with > .0.19.1 release but not a show stopper. > > Ted > > > -----Original Message----- > From: Rakhi Khatwani [mailto:[EMAIL PROTECTED]] > Sent: Friday, April 17, 2009 12:45 PM > To: [EMAIL PROTECTED]; [EMAIL PROTECTED] > Subject: Re: Ec2 instability > > Hi, > this is the exception i have been getting @ the mapreduce > > java.io.IOException: Cannot run program "bash": java.io.IOException: > error=12, Cannot allocate memory > at java.lang.ProcessBuilder.start(ProcessBuilder.java:459) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:149) > at org.apache.hadoop.util.Shell.run(Shell.java:134) > at org.apache.hadoop.fs.DF.getAvailable(DF.java:73) > at > org.apache.hadoop.fs.LocalDirAllocator > $AllocatorPerContext.getLocalPathF > orWrite(LocalDirAllocator.java:321) > at > org > .apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllo > cator.java:124) > at > org > .apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFi > le.java:61) > at > org.apache.hadoop.mapred.MapTask > $MapOutputBuffer.mergeParts(MapTask.java > :1199) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java: > 857) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333) > at org.apache.hadoop.mapred.Child.main(Child.java:155) > Caused by: java.io.IOException: java.io.IOException: error=12, Cannot > allocate memory > at java.lang.UNIXProcess.(UNIXProcess.java:148) > at java.lang.ProcessImpl.start(ProcessImpl.java:65) > at java.lang.ProcessBuilder.start(ProcessBuilder.java:452) > ... 10 more > > > > On Fri, Apr 17, 2009 at 10:09 PM, Rakhi Khatwani > <[EMAIL PROTECTED]>wrote: > >> Hi, >> Its been several days since we have been trying to stabilize >> hadoop/hbase on ec2 cluster. but failed to do so. >> We still come across frequent region server fails, scanner timeout >> exceptions and OS level deadlocks etc... >> >> and 2day while doing a list of tables on hbase i get the following >> exception: >> >> hbase(main):001:0> list >> 09/04/17 13:57:18 INFO ipc.HBaseClass: Retrying connect to server: / >> 10.254.234.32:60020. Already tried 0 time(s). >> 09/04/17 13:57:19 INFO ipc.HBaseClass: Retrying connect to server: / >> 10.254.234.32:60020. Already tried 1 time(s). >> 09/04/17 13:57:20 INFO ipc.HBaseClass: Retrying connect to server: / >> 10.254.234.32:60020. Already tried 2 time(s). >> 09/04/17 13:57:20 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 > not >> available yet, Zzzzz... >> 09/04/17 13:57:20 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 > could >> not be reached after 1 tries, giving up. >> 09/04/17 13:57:21 INFO ipc.HBaseClass: Retrying connect to server: / >> 10.254.234.32:60020. Already tried 0 time(s). >> 09/04/17 13:57:22 INFO ipc.HBaseClass: Retrying connect to server: / >> 10.254.234.32:60020. Already tried 1 time(s). >> 09/04/17 13:57:23 INFO ipc.HBaseClass: Retrying connect to server: / >> 10.254.234.32:60020. Already tried 2 time(s). >> 09/04/17 13:57:23 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 > not >> available yet, Zzzzz... >> 09/04/17 13:57:23 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 > could >> not be reached after 1 tries, giving up. >> 09/04/17 13:57:26 INFO ipc.HBaseClass: Retrying connect to server: / >> 10.254.234.32:60020. Already tried 0 time(s). >> 09/04/17 13:57:27 INFO ipc.HBaseClass: Retrying connect to server: / >> 10.254.234.32:60020. Already tried 1 time(s). >> 09/04/17 13:57:28 INFO ipc.HBaseClass: Retrying connect to server: / >> 10.254.234.32:60020. Already tried 2 time(s). |