|
ajay.gov
2011-04-07, 02:10
Jean-Daniel Cryans
2011-04-07, 17:29
ajay.gov
2011-04-07, 19:20
Ajay Govindarajan
2011-04-08, 00:35
Ajay Govindarajan
2011-04-08, 19:21
Jean-Daniel Cryans
2011-04-08, 21:20
Ajay Govindarajan
2011-04-08, 23:27
Stan Barton
2011-04-26, 10:25
Michel Segel
2011-04-26, 15:43
ajay.gov
2011-04-26, 17:17
Jeff Whiting
2011-04-26, 18:36
Stack
2011-04-26, 18:41
Stan Barton
2011-04-27, 09:30
Stack
2011-04-27, 16:40
Stan Barton
2011-04-28, 13:54
Stack
2011-05-07, 21:51
Stan Barton
2011-05-13, 14:44
Ted Yu
2011-05-13, 15:04
Stack
2011-05-13, 19:34
Stan Barton
2011-05-16, 11:55
Stack
2011-05-16, 18:30
Stan Barton
2011-05-18, 12:59
|
-
HTable.put hangs on bulk loadingajay.gov 2011-04-07, 02:10
I am doing a load test for which I need to load a table with many rows. I have a small java program that has a for loop and calls HTable.put. I am inserting a map of 2 items into a table that has one column family. The limit of the for loop is currently 20000. However after 15876 rows the call to Put hangs. I am using autoFlush on the HTable. Any ideas why this may happen? The table configuration: DESCRIPTION ENABLED {NAME => 'TABLE2', FAMILIES => [{NAME => 'TABLE2_CF true 1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0' , COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2 147483647', BLOCKSIZE => '65536', IN_MEMORY => 'fal se', BLOCKCACHE => 'true'}]} The HBase config on the client is the one in the hbase-default.xml. Some values: hbase.client.write.buffer=2097152 hbase.client.pause=1000 hbase.client.retries.number=10 If i use another client I am able to put items to the table. I am also able to scan items from the table using the hbase shell. I have attached the server configuratio I don't see anything in the region server or master logs. I have them here. The master server log: 2011-04-06 19:02:40,149 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scanning meta region {server: 184.106.69.238:60020, regionname: -ROOT-,,0.70236052, startKey: <>} 2011-04-06 19:02:40,152 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scan of 1 row(s) of meta region {server: 184.106.69.238:60020, regionname: -ROOT-,,0.70236052, startKey: <>} complete 2011-04-06 19:02:40,157 INFO org.apache.hadoop.hbase.master.ServerManager: 1 region servers, 0 dead, average load 42.0 2011-04-06 19:03:15,252 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {server: 184.106.69.238:60020, regionname: .META.,,1.1028785192, startKey: <>} 2011-04-06 19:03:15,265 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scan of 40 row(s) of meta region {server: 184.106.69.238:60020, regionname: .META.,,1.1028785192, startKey: <>} complete 2011-04-06 19:03:15,266 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s) scanned The region server logs: 2011-04-06 19:02:21,294 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Creating region TABLE2,,1302141740486.010a5ae704ed53f656cbddb8e489162a. 2011-04-06 19:02:21,295 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined TABLE2,,1302141740486.010a5ae704ed53f656cbddb8e489162a.; next sequenceid=1 -- View this message in context: http://old.nabble.com/HTable.put-hangs-on-bulk-loading-tp31338874p31338874.html Sent from the HBase User mailing list archive at Nabble.com.
-
Re: HTable.put hangs on bulk loadingJean-Daniel Cryans 2011-04-07, 17:29
There's nothing of use in the pasted logs unfortunately, and the log
didn't get attached to your mail (happens often). Consider putting on a web server or pastebin. Also I see you are on an older version, upgrading isn't going to fix your issue (which is probably related to your environment or configuration) but at least it's gonna be easier for us to support you. J-D On Wed, Apr 6, 2011 at 7:10 PM, ajay.gov <[EMAIL PROTECTED]> wrote: > > I am doing a load test for which I need to load a table with many rows. I > have a small java program that has a for loop and calls HTable.put. I am > inserting a map of 2 items into a table that has one column family. The > limit of the for loop is currently 20000. However after 15876 rows the call > to Put hangs. I am using autoFlush on the HTable. Any ideas why this may > happen? > > The table configuration: > DESCRIPTION ENABLED > {NAME => 'TABLE2', FAMILIES => [{NAME => 'TABLE2_CF true > 1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0' > , COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2 > 147483647', BLOCKSIZE => '65536', IN_MEMORY => 'fal > se', BLOCKCACHE => 'true'}]} > > The HBase config on the client is the one in the hbase-default.xml. Some > values: > hbase.client.write.buffer=2097152 > hbase.client.pause=1000 > hbase.client.retries.number=10 > > If i use another client I am able to put items to the table. I am also able > to scan items from the table using the hbase shell. > > I have attached the server configuratio > I don't see anything in the region server or master logs. I have them here. > > The master server log: > 2011-04-06 19:02:40,149 INFO org.apache.hadoop.hbase.master.BaseScanner: > RegionManager.rootScanner scanning meta region {server: > 184.106.69.238:60020, regionname: -ROOT-,,0.70236052, startKey: <>} > 2011-04-06 19:02:40,152 INFO org.apache.hadoop.hbase.master.BaseScanner: > RegionManager.rootScanner scan of 1 row(s) of meta region {server: > 184.106.69.238:60020, regionname: -ROOT-,,0.70236052, startKey: <>} complete > 2011-04-06 19:02:40,157 INFO org.apache.hadoop.hbase.master.ServerManager: 1 > region servers, 0 dead, average load 42.0 > 2011-04-06 19:03:15,252 INFO org.apache.hadoop.hbase.master.BaseScanner: > RegionManager.metaScanner scanning meta region {server: > 184.106.69.238:60020, regionname: .META.,,1.1028785192, startKey: <>} > 2011-04-06 19:03:15,265 INFO org.apache.hadoop.hbase.master.BaseScanner: > RegionManager.metaScanner scan of 40 row(s) of meta region {server: > 184.106.69.238:60020, regionname: .META.,,1.1028785192, startKey: <>} > complete > 2011-04-06 19:03:15,266 INFO org.apache.hadoop.hbase.master.BaseScanner: All > 1 .META. region(s) scanned > > > The region server logs: > 2011-04-06 19:02:21,294 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Creating region TABLE2,,1302141740486.010a5ae704ed53f656cbddb8e489162a. > 2011-04-06 19:02:21,295 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Onlined TABLE2,,1302141740486.010a5ae704ed53f656cbddb8e489162a.; next > sequenceid=1 > > -- > View this message in context: http://old.nabble.com/HTable.put-hangs-on-bulk-loading-tp31338874p31338874.html > Sent from the HBase User mailing list archive at Nabble.com. > >
-
Re: HTable.put hangs on bulk loadingajay.gov 2011-04-07, 19:20
Sorry, my server config was not attached. Its here: http://pastebin.com/U41QZGiq thanks -ajay ajay.gov wrote: > > I am doing a load test for which I need to load a table with many rows. I > have a small java program that has a for loop and calls HTable.put. I am > inserting a map of 2 items into a table that has one column family. The > limit of the for loop is currently 20000. However after 15876 rows the > call to Put hangs. I am using autoFlush on the HTable. Any ideas why this > may happen? > > The table configuration: > DESCRIPTION ENABLED > {NAME => 'TABLE2', FAMILIES => [{NAME => 'TABLE2_CF true > 1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0' > , COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2 > 147483647', BLOCKSIZE => '65536', IN_MEMORY => 'fal > se', BLOCKCACHE => 'true'}]} > > The HBase config on the client is the one in the hbase-default.xml. Some > values: > hbase.client.write.buffer=2097152 > hbase.client.pause=1000 > hbase.client.retries.number=10 > > If i use another client I am able to put items to the table. I am also > able to scan items from the table using the hbase shell. > > I have attached the server configuratio > I don't see anything in the region server or master logs. I have them > here. > > The master server log: > 2011-04-06 19:02:40,149 INFO org.apache.hadoop.hbase.master.BaseScanner: > RegionManager.rootScanner scanning meta region {server: > 184.106.69.238:60020, regionname: -ROOT-,,0.70236052, startKey: <>} > 2011-04-06 19:02:40,152 INFO org.apache.hadoop.hbase.master.BaseScanner: > RegionManager.rootScanner scan of 1 row(s) of meta region {server: > 184.106.69.238:60020, regionname: -ROOT-,,0.70236052, startKey: <>} > complete > 2011-04-06 19:02:40,157 INFO org.apache.hadoop.hbase.master.ServerManager: > 1 region servers, 0 dead, average load 42.0 > 2011-04-06 19:03:15,252 INFO org.apache.hadoop.hbase.master.BaseScanner: > RegionManager.metaScanner scanning meta region {server: > 184.106.69.238:60020, regionname: .META.,,1.1028785192, startKey: <>} > 2011-04-06 19:03:15,265 INFO org.apache.hadoop.hbase.master.BaseScanner: > RegionManager.metaScanner scan of 40 row(s) of meta region {server: > 184.106.69.238:60020, regionname: .META.,,1.1028785192, startKey: <>} > complete > 2011-04-06 19:03:15,266 INFO org.apache.hadoop.hbase.master.BaseScanner: > All 1 .META. region(s) scanned > > > The region server logs: > 2011-04-06 19:02:21,294 DEBUG > org.apache.hadoop.hbase.regionserver.HRegion: Creating region > TABLE2,,1302141740486.010a5ae704ed53f656cbddb8e489162a. > 2011-04-06 19:02:21,295 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Onlined TABLE2,,1302141740486.010a5ae704ed53f656cbddb8e489162a.; next > sequenceid=1 > > -- View this message in context: http://old.nabble.com/HTable.put-hangs-on-bulk-loading-tp31338874p31345580.html Sent from the HBase User mailing list archive at Nabble.com.
-
Re: HTable.put hangs on bulk loadingAjay Govindarajan 2011-04-08, 00:35
Thanks for pointing this out. I have uploaded the server config at:
http://pastebin.com/U41QZGiq thanks -ajay ________________________________ From: Jean-Daniel Cryans <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Thursday, April 7, 2011 10:29 AM Subject: Re: HTable.put hangs on bulk loading There's nothing of use in the pasted logs unfortunately, and the log didn't get attached to your mail (happens often). Consider putting on a web server or pastebin. Also I see you are on an older version, upgrading isn't going to fix your issue (which is probably related to your environment or configuration) but at least it's gonna be easier for us to support you. J-D On Wed, Apr 6, 2011 at 7:10 PM, ajay.gov <[EMAIL PROTECTED]> wrote: > > I am doing a load test for which I need to load a table with many rows. I > have a small java program that has a for loop and calls HTable.put. I am > inserting a map of 2 items into a table that has one column family. The > limit of the for loop is currently 20000. However after 15876 rows the call > to Put hangs. I am using autoFlush on the HTable. Any ideas why this may > happen? > > The table configuration: > DESCRIPTION ENABLED > {NAME => 'TABLE2', FAMILIES => [{NAME => 'TABLE2_CF true > 1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0' > , COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2 > 147483647', BLOCKSIZE => '65536', IN_MEMORY => 'fal > se', BLOCKCACHE => 'true'}]} > > The HBase config on the client is the one in the hbase-default.xml. Some > values: > hbase.client.write.buffer=2097152 > hbase.client.pause=1000 > hbase.client.retries.number=10 > > If i use another client I am able to put items to the table. I am also able > to scan items from the table using the hbase shell. > > I have attached the server configuratio > I don't see anything in the region server or master logs. I have them here. > > The master server log: > 2011-04-06 19:02:40,149 INFO org.apache.hadoop.hbase.master.BaseScanner: > RegionManager.rootScanner scanning meta region {server: > 184.106.69.238:60020, regionname: -ROOT-,,0.70236052, startKey: <>} > 2011-04-06 19:02:40,152 INFO org.apache.hadoop.hbase.master.BaseScanner: > RegionManager.rootScanner scan of 1 row(s) of meta region {server: > 184.106.69.238:60020, regionname: -ROOT-,,0.70236052, startKey: <>} complete > 2011-04-06 19:02:40,157 INFO org.apache.hadoop.hbase.master.ServerManager: 1 > region servers, 0 dead, average load 42.0 > 2011-04-06 19:03:15,252 INFO org.apache.hadoop.hbase.master.BaseScanner: > RegionManager.metaScanner scanning meta region {server: > 184.106.69.238:60020, regionname: .META.,,1.1028785192, startKey: <>} > 2011-04-06 19:03:15,265 INFO org.apache.hadoop.hbase.master.BaseScanner: > RegionManager.metaScanner scan of 40 row(s) of meta region {server: > 184.106.69.238:60020, regionname: .META.,,1.1028785192, startKey: <>} > complete > 2011-04-06 19:03:15,266 INFO org.apache.hadoop.hbase.master.BaseScanner: All > 1 .META. region(s) scanned > > > The region server logs: > 2011-04-06 19:02:21,294 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Creating region TABLE2,,1302141740486.010a5ae704ed53f656cbddb8e489162a. > 2011-04-06 19:02:21,295 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Onlined TABLE2,,1302141740486.010a5ae704ed53f656cbddb8e489162a.; next > sequenceid=1 > > -- > View this message in context: http://old.nabble.com/HTable.put-hangs-on-bulk-loading-tp31338874p31338874.html > Sent from the HBase User mailing list archive at Nabble.com. > >
-
Re: HTable.put hangs on bulk loadingAjay Govindarajan 2011-04-08, 19:21
I used to call HTable.close on each put. I commented it out and I get the exception below (the program stops insertion at the exact same point i.e. 15876 rows)
Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:640) at java.util.concurrent.ThreadPoolExecutor.ensureQueuedTaskHandled(ThreadPoolExecutor.java:760) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:655) at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:92) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfPuts(HConnectionManager.java:1439) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:664) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:549) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:535) I am not sure why this happens because I commit after every put. Any help will be appreciated. thanks -ajay ________________________________ From: Ajay Govindarajan <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Thursday, April 7, 2011 5:35 PM Subject: Re: HTable.put hangs on bulk loading Thanks for pointing this out. I have uploaded the server config at: http://pastebin.com/U41QZGiq thanks -ajay ________________________________ From: Jean-Daniel Cryans <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Thursday, April 7, 2011 10:29 AM Subject: Re: HTable.put hangs on bulk loading There's nothing of use in the pasted logs unfortunately, and the log didn't get attached to your mail (happens often). Consider putting on a web server or pastebin. Also I see you are on an older version, upgrading isn't going to fix your issue (which is probably related to your environment or configuration) but at least it's gonna be easier for us to support you. J-D On Wed, Apr 6, 2011 at 7:10 PM, ajay.gov <[EMAIL PROTECTED]> wrote: > > I am doing a load test for which I need to load a table with many rows. I > have a small java program that has a for loop and calls HTable.put. I am > inserting a map of 2 items into a table that has one column family. The > limit of the for loop is currently 20000. However after 15876 rows the call > to Put hangs. I am using autoFlush on the HTable. Any ideas why this may > happen? > > The table configuration: > DESCRIPTION ENABLED > {NAME => 'TABLE2', FAMILIES => [{NAME => 'TABLE2_CF true > 1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0' > , COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2 > 147483647', BLOCKSIZE => '65536', IN_MEMORY => 'fal > se', BLOCKCACHE => 'true'}]} > > The HBase config on the client is the one in the hbase-default.xml. Some > values: > hbase.client.write.buffer=2097152 > hbase.client.pause=1000 > hbase.client.retries.number=10 > > If i use another client I am able to put items to the table. I am also able > to scan items from the table using the hbase shell. > > I have attached the server configuratio > I don't see anything in the region server or master logs. I have them here. > > The master server log: > 2011-04-06 19:02:40,149 INFO org.apache.hadoop.hbase.master.BaseScanner: > RegionManager.rootScanner scanning meta region {server: > 184.106.69.238:60020, regionname: -ROOT-,,0.70236052, startKey: <>} > 2011-04-06 19:02:40,152 INFO org.apache.hadoop.hbase.master.BaseScanner: > RegionManager.rootScanner scan of 1 row(s) of meta region {server: > 184.106.69.238:60020, regionname: -ROOT-,,0.70236052, startKey: <>} complete > 2011-04-06 19:02:40,157 INFO org.apache.hadoop.hbase.master.ServerManager: 1 > region servers, 0 dead, average load 42.0 > 2011-04-06 19:03:15,252 INFO org.apache.hadoop.hbase.master.BaseScanner: > RegionManager.metaScanner scanning meta region {server: > 184.106.69.238:60020, regionname: .META.,,1.1028785192, startKey: <>}
-
Re: HTable.put hangs on bulk loadingJean-Daniel Cryans 2011-04-08, 21:20
That exception means you are running out of threads on that whole
machine. I wonder how you were able to get that... is hbase running on that machine too? I'd love you see your configuration but what you pasted is the hbase-default, which doesn't say anything since it's all the default values. The newer FC releases have an awful small setting for nproc If that's your case you might want to bump that, google should tell you how on your specific system. J-D On Fri, Apr 8, 2011 at 12:21 PM, Ajay Govindarajan <[EMAIL PROTECTED]> wrote: > I used to call HTable.close on each put. I commented it out and I get the exception below (the program stops insertion at the exact same point i.e. 15876 rows) > > Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:640) > at java.util.concurrent.ThreadPoolExecutor.ensureQueuedTaskHandled(ThreadPoolExecutor.java:760) > at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:655) > at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:92) > at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfPuts(HConnectionManager.java:1439) > at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:664) > at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:549) > at org.apache.hadoop.hbase.client.HTable.put(HTable.java:535) > > I am not sure why this happens because I commit after every put. > > Any help will be appreciated. > > thanks > -ajay > > > > > > ________________________________ > From: Ajay Govindarajan <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Sent: Thursday, April 7, 2011 5:35 PM > Subject: Re: HTable.put hangs on bulk loading > > Thanks for pointing this out. I have uploaded the server config at: > http://pastebin.com/U41QZGiq > > thanks > -ajay > > > > > > > > ________________________________ > From: Jean-Daniel Cryans <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Thursday, April 7, 2011 10:29 AM > Subject: Re: HTable.put hangs on bulk loading > > There's nothing of use in the pasted logs unfortunately, and the log > didn't get attached to your mail (happens often). Consider putting on > a web server or pastebin. > > Also I see you are on an older version, upgrading isn't going to fix > your issue (which is probably related to your environment or > configuration) but at least it's gonna be easier for us to support > you. > > J-D > > On Wed, Apr 6, 2011 at 7:10 PM, ajay.gov <[EMAIL PROTECTED]> wrote: >> >> I am doing a load test for which I need to load a table with many rows. I >> have a small java program that has a for loop and calls HTable.put. I am >> inserting a map of 2 items into a table that has one column family. The >> limit of the for loop is currently 20000. However after 15876 rows the call >> to Put hangs. I am using autoFlush on the HTable. Any ideas why this may >> happen? >> >> The table configuration: >> DESCRIPTION ENABLED >> {NAME => 'TABLE2', FAMILIES => [{NAME => 'TABLE2_CF true >> 1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0' >> , COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2 >> 147483647', BLOCKSIZE => '65536', IN_MEMORY => 'fal >> se', BLOCKCACHE => 'true'}]} >> >> The HBase config on the client is the one in the hbase-default.xml. Some >> values: >> hbase.client.write.buffer=2097152 >> hbase.client.pause=1000 >> hbase.client.retries.number=10 >> >> If i use another client I am able to put items to the table. I am also able >> to scan items from the table using the hbase shell. >> >> I have attached the server configuratio >> I don't see anything in the region server or master logs. I have them here. >> >> The master server log: >> 2011-04-06 19:02:40,149 INFO org.apache.hadoop.hbase.master.BaseScanner:
-
Re: HTable.put hangs on bulk loadingAjay Govindarajan 2011-04-08, 23:27
You were right! The limit was low when i logged in as a normal user. I ran the process with sudo and it works fine now.
Thanks so much for the help. -ajay ________________________________ From: Jean-Daniel Cryans <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]; Ajay Govindarajan <[EMAIL PROTECTED]> Sent: Friday, April 8, 2011 2:20 PM Subject: Re: HTable.put hangs on bulk loading That exception means you are running out of threads on that whole machine. I wonder how you were able to get that... is hbase running on that machine too? I'd love you see your configuration but what you pasted is the hbase-default, which doesn't say anything since it's all the default values. The newer FC releases have an awful small setting for nproc If that's your case you might want to bump that, google should tell you how on your specific system. J-D On Fri, Apr 8, 2011 at 12:21 PM, Ajay Govindarajan <[EMAIL PROTECTED]> wrote: > I used to call HTable.close on each put. I commented it out and I get the exception below (the program stops insertion at the exact same point i.e. 15876 rows) > > Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:640) > at java.util.concurrent.ThreadPoolExecutor.ensureQueuedTaskHandled(ThreadPoolExecutor.java:760) > at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:655) > at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:92) > at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfPuts(HConnectionManager.java:1439) > at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:664) > at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:549) > at org.apache.hadoop.hbase.client.HTable.put(HTable.java:535) > > I am not sure why this happens because I commit after every put. > > Any help will be appreciated. > > thanks > -ajay > > > > > > ________________________________ > From: Ajay Govindarajan <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Sent: Thursday, April 7, 2011 5:35 PM > Subject: Re: HTable.put hangs on bulk loading > > Thanks for pointing this out. I have uploaded the server config at: > http://pastebin.com/U41QZGiq > > thanks > -ajay > > > > > > > > ________________________________ > From: Jean-Daniel Cryans <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Thursday, April 7, 2011 10:29 AM > Subject: Re: HTable.put hangs on bulk loading > > There's nothing of use in the pasted logs unfortunately, and the log > didn't get attached to your mail (happens often). Consider putting on > a web server or pastebin. > > Also I see you are on an older version, upgrading isn't going to fix > your issue (which is probably related to your environment or > configuration) but at least it's gonna be easier for us to support > you. > > J-D > > On Wed, Apr 6, 2011 at 7:10 PM, ajay.gov <[EMAIL PROTECTED]> wrote: >> >> I am doing a load test for which I need to load a table with many rows. I >> have a small java program that has a for loop and calls HTable.put. I am >> inserting a map of 2 items into a table that has one column family. The >> limit of the for loop is currently 20000. However after 15876 rows the call >> to Put hangs. I am using autoFlush on the HTable. Any ideas why this may >> happen? >> >> The table configuration: >> DESCRIPTION ENABLED >> {NAME => 'TABLE2', FAMILIES => [{NAME => 'TABLE2_CF true >> 1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0' >> , COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2 >> 147483647', BLOCKSIZE => '65536', IN_MEMORY => 'fal >> se', BLOCKCACHE => 'true'}]} >> >> The HBase config on the client is the one in the hbase-default.xml. Some >> values: >> hbase.client.write.buffer=2097152
-
Re: HTable.put hangs on bulk loadingStan Barton 2011-04-26, 10:25
I am running into similar problem with HBase 0.90.2. My setting is 6 RSs one master server, 3 ZK servers and about 20 clients (on different servers) feeding the database with inserts. After few hours and around 2.5 millions rows inserted the process simply hangs with no what so ever error indication (neither by ZK, master, regionservers nor the clients). All the inserting clients stop virtually at the same moment, but HBase it self is not down and can be queried. In fact, the clients not even crash on time out exception for its connection. I have ran into such problem always when I attempted to run such importing progress with the new HBase versions (even 0.90.1). Can anybody address this problem? Does other have similar problems? I can provide further info about the configuration if needed. Stan Barton ajay.gov wrote: > > Sorry, my server config was not attached. Its here: > http://pastebin.com/U41QZGiq > > thanks > -ajay > > > > ajay.gov wrote: >> >> I am doing a load test for which I need to load a table with many rows. >> I have a small java program that has a for loop and calls HTable.put. I >> am inserting a map of 2 items into a table that has one column family. >> The limit of the for loop is currently 20000. However after 15876 rows >> the call to Put hangs. I am using autoFlush on the HTable. Any ideas why >> this may happen? >> >> The table configuration: >> DESCRIPTION ENABLED >> {NAME => 'TABLE2', FAMILIES => [{NAME => 'TABLE2_CF true >> 1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0' >> , COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2 >> 147483647', BLOCKSIZE => '65536', IN_MEMORY => 'fal >> se', BLOCKCACHE => 'true'}]} >> >> The HBase config on the client is the one in the hbase-default.xml. Some >> values: >> hbase.client.write.buffer=2097152 >> hbase.client.pause=1000 >> hbase.client.retries.number=10 >> >> If i use another client I am able to put items to the table. I am also >> able to scan items from the table using the hbase shell. >> >> I have attached the server configuratio >> I don't see anything in the region server or master logs. I have them >> here. >> >> The master server log: >> 2011-04-06 19:02:40,149 INFO org.apache.hadoop.hbase.master.BaseScanner: >> RegionManager.rootScanner scanning meta region {server: >> 184.106.69.238:60020, regionname: -ROOT-,,0.70236052, startKey: <>} >> 2011-04-06 19:02:40,152 INFO org.apache.hadoop.hbase.master.BaseScanner: >> RegionManager.rootScanner scan of 1 row(s) of meta region {server: >> 184.106.69.238:60020, regionname: -ROOT-,,0.70236052, startKey: <>} >> complete >> 2011-04-06 19:02:40,157 INFO >> org.apache.hadoop.hbase.master.ServerManager: 1 region servers, 0 dead, >> average load 42.0 >> 2011-04-06 19:03:15,252 INFO org.apache.hadoop.hbase.master.BaseScanner: >> RegionManager.metaScanner scanning meta region {server: >> 184.106.69.238:60020, regionname: .META.,,1.1028785192, startKey: <>} >> 2011-04-06 19:03:15,265 INFO org.apache.hadoop.hbase.master.BaseScanner: >> RegionManager.metaScanner scan of 40 row(s) of meta region {server: >> 184.106.69.238:60020, regionname: .META.,,1.1028785192, startKey: <>} >> complete >> 2011-04-06 19:03:15,266 INFO org.apache.hadoop.hbase.master.BaseScanner: >> All 1 .META. region(s) scanned >> >> >> The region server logs: >> 2011-04-06 19:02:21,294 DEBUG >> org.apache.hadoop.hbase.regionserver.HRegion: Creating region >> TABLE2,,1302141740486.010a5ae704ed53f656cbddb8e489162a. >> 2011-04-06 19:02:21,295 INFO >> org.apache.hadoop.hbase.regionserver.HRegion: Onlined >> TABLE2,,1302141740486.010a5ae704ed53f656cbddb8e489162a.; next >> sequenceid=1 >> >> > > -- View this message in context: http://old.nabble.com/HTable.put-hangs-on-bulk-loading-tp31338874p31477194.html Sent from the HBase User mailing list archive at Nabble.com.
-
Re: HTable.put hangs on bulk loadingMichel Segel 2011-04-26, 15:43
How many regions on a region server?
Table splits? GC tuning parameters? HBase heap size? All of these qustions could be important... Sent from a remote device. Please excuse any typos... Mike Segel On Apr 26, 2011, at 5:25 AM, Stan Barton <[EMAIL PROTECTED]> wrote: > > I am running into similar problem with HBase 0.90.2. My setting is 6 RSs one > master server, 3 ZK servers and about 20 clients (on different servers) > feeding the database with inserts. After few hours and around 2.5 millions > rows inserted the process simply hangs with no what so ever error indication > (neither by ZK, master, regionservers nor the clients). All the inserting > clients stop virtually at the same moment, but HBase it self is not down and > can be queried. > > In fact, the clients not even crash on time out exception for its > connection. I have ran into such problem always when I attempted to run such > importing progress with the new HBase versions (even 0.90.1). Can anybody > address this problem? Does other have similar problems? > > I can provide further info about the configuration if needed. > > Stan Barton > > > > > ajay.gov wrote: >> >> Sorry, my server config was not attached. Its here: >> http://pastebin.com/U41QZGiq >> >> thanks >> -ajay >> >> >> >> ajay.gov wrote: >>> >>> I am doing a load test for which I need to load a table with many rows. >>> I have a small java program that has a for loop and calls HTable.put. I >>> am inserting a map of 2 items into a table that has one column family. >>> The limit of the for loop is currently 20000. However after 15876 rows >>> the call to Put hangs. I am using autoFlush on the HTable. Any ideas why >>> this may happen? >>> >>> The table configuration: >>> DESCRIPTION ENABLED >>> {NAME => 'TABLE2', FAMILIES => [{NAME => 'TABLE2_CF true >>> 1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0' >>> , COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2 >>> 147483647', BLOCKSIZE => '65536', IN_MEMORY => 'fal >>> se', BLOCKCACHE => 'true'}]} >>> >>> The HBase config on the client is the one in the hbase-default.xml. Some >>> values: >>> hbase.client.write.buffer=2097152 >>> hbase.client.pause=1000 >>> hbase.client.retries.number=10 >>> >>> If i use another client I am able to put items to the table. I am also >>> able to scan items from the table using the hbase shell. >>> >>> I have attached the server configuratio >>> I don't see anything in the region server or master logs. I have them >>> here. >>> >>> The master server log: >>> 2011-04-06 19:02:40,149 INFO org.apache.hadoop.hbase.master.BaseScanner: >>> RegionManager.rootScanner scanning meta region {server: >>> 184.106.69.238:60020, regionname: -ROOT-,,0.70236052, startKey: <>} >>> 2011-04-06 19:02:40,152 INFO org.apache.hadoop.hbase.master.BaseScanner: >>> RegionManager.rootScanner scan of 1 row(s) of meta region {server: >>> 184.106.69.238:60020, regionname: -ROOT-,,0.70236052, startKey: <>} >>> complete >>> 2011-04-06 19:02:40,157 INFO >>> org.apache.hadoop.hbase.master.ServerManager: 1 region servers, 0 dead, >>> average load 42.0 >>> 2011-04-06 19:03:15,252 INFO org.apache.hadoop.hbase.master.BaseScanner: >>> RegionManager.metaScanner scanning meta region {server: >>> 184.106.69.238:60020, regionname: .META.,,1.1028785192, startKey: <>} >>> 2011-04-06 19:03:15,265 INFO org.apache.hadoop.hbase.master.BaseScanner: >>> RegionManager.metaScanner scan of 40 row(s) of meta region {server: >>> 184.106.69.238:60020, regionname: .META.,,1.1028785192, startKey: <>} >>> complete >>> 2011-04-06 19:03:15,266 INFO org.apache.hadoop.hbase.master.BaseScanner: >>> All 1 .META. region(s) scanned >>> >>> >>> The region server logs: >>> 2011-04-06 19:02:21,294 DEBUG >>> org.apache.hadoop.hbase.regionserver.HRegion: Creating region >>> TABLE2,,1302141740486.010a5ae704ed53f656cbddb8e489162a.
-
Re: HTable.put hangs on bulk loadingajay.gov 2011-04-26, 17:17
Hi, I posted the same message on the [EMAIL PROTECTED] mailing list and Jean-Daniel Cryans suggested i increase the nproc limit on the client machines. I did it and it fixed the problem. -ajay Stan Barton wrote: > > I am running into similar problem with HBase 0.90.2. My setting is 6 RSs > one master server, 3 ZK servers and about 20 clients (on different > servers) feeding the database with inserts. After few hours and around 2.5 > millions rows inserted the process simply hangs with no what so ever error > indication (neither by ZK, master, regionservers nor the clients). All the > inserting clients stop virtually at the same moment, but HBase it self is > not down and can be queried. > > In fact, the clients not even crash on time out exception for its > connection. I have ran into such problem always when I attempted to run > such importing progress with the new HBase versions (even 0.90.1). Can > anybody address this problem? Does other have similar problems? > > I can provide further info about the configuration if needed. > > Stan Barton > > > > > ajay.gov wrote: >> >> Sorry, my server config was not attached. Its here: >> http://pastebin.com/U41QZGiq >> >> thanks >> -ajay >> >> >> >> ajay.gov wrote: >>> >>> I am doing a load test for which I need to load a table with many rows. >>> I have a small java program that has a for loop and calls HTable.put. I >>> am inserting a map of 2 items into a table that has one column family. >>> The limit of the for loop is currently 20000. However after 15876 rows >>> the call to Put hangs. I am using autoFlush on the HTable. Any ideas why >>> this may happen? >>> >>> The table configuration: >>> DESCRIPTION ENABLED >>> {NAME => 'TABLE2', FAMILIES => [{NAME => 'TABLE2_CF true >>> 1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0' >>> , COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2 >>> 147483647', BLOCKSIZE => '65536', IN_MEMORY => 'fal >>> se', BLOCKCACHE => 'true'}]} >>> >>> The HBase config on the client is the one in the hbase-default.xml. Some >>> values: >>> hbase.client.write.buffer=2097152 >>> hbase.client.pause=1000 >>> hbase.client.retries.number=10 >>> >>> If i use another client I am able to put items to the table. I am also >>> able to scan items from the table using the hbase shell. >>> >>> I have attached the server configuratio >>> I don't see anything in the region server or master logs. I have them >>> here. >>> >>> The master server log: >>> 2011-04-06 19:02:40,149 INFO org.apache.hadoop.hbase.master.BaseScanner: >>> RegionManager.rootScanner scanning meta region {server: >>> 184.106.69.238:60020, regionname: -ROOT-,,0.70236052, startKey: <>} >>> 2011-04-06 19:02:40,152 INFO org.apache.hadoop.hbase.master.BaseScanner: >>> RegionManager.rootScanner scan of 1 row(s) of meta region {server: >>> 184.106.69.238:60020, regionname: -ROOT-,,0.70236052, startKey: <>} >>> complete >>> 2011-04-06 19:02:40,157 INFO >>> org.apache.hadoop.hbase.master.ServerManager: 1 region servers, 0 dead, >>> average load 42.0 >>> 2011-04-06 19:03:15,252 INFO org.apache.hadoop.hbase.master.BaseScanner: >>> RegionManager.metaScanner scanning meta region {server: >>> 184.106.69.238:60020, regionname: .META.,,1.1028785192, startKey: <>} >>> 2011-04-06 19:03:15,265 INFO org.apache.hadoop.hbase.master.BaseScanner: >>> RegionManager.metaScanner scan of 40 row(s) of meta region {server: >>> 184.106.69.238:60020, regionname: .META.,,1.1028785192, startKey: <>} >>> complete >>> 2011-04-06 19:03:15,266 INFO org.apache.hadoop.hbase.master.BaseScanner: >>> All 1 .META. region(s) scanned >>> >>> >>> The region server logs: >>> 2011-04-06 19:02:21,294 DEBUG >>> org.apache.hadoop.hbase.regionserver.HRegion: Creating region >>> TABLE2,,1302141740486.010a5ae704ed53f656cbddb8e489162a. >>> 2011-04-06 19 View this message in context: http://old.nabble.com/HTable.put-hangs-on-bulk-loading-tp31338874p31480444.html Sent from the HBase User mailing list archive at Nabble.com.
-
Re: HTable.put hangs on bulk loadingJeff Whiting 2011-04-26, 18:36
Would it make sense to do some kind of sanity check on these various configuration parameters when a
region or master server starts? It seems like there is a lot of them and when they aren't right, it can cause big problems. Just have it check the configuration parameters and output a warning in the log. E.g. log.warn("Warning, OS setting may be too low: ulimit should be at least X. Considering changing it."); Also if that were to show up in the Master.jsp it would be even better. ~Jeff On 4/26/2011 11:17 AM, ajay.gov wrote: > Hi, > > I posted the same message on the [EMAIL PROTECTED] mailing list and > Jean-Daniel Cryans suggested i increase the nproc limit on the client > machines. I did it and it fixed the problem. > > -ajay > > > > Stan Barton wrote: >> I am running into similar problem with HBase 0.90.2. My setting is 6 RSs >> one master server, 3 ZK servers and about 20 clients (on different >> servers) feeding the database with inserts. After few hours and around 2.5 >> millions rows inserted the process simply hangs with no what so ever error >> indication (neither by ZK, master, regionservers nor the clients). All the >> inserting clients stop virtually at the same moment, but HBase it self is >> not down and can be queried. >> >> In fact, the clients not even crash on time out exception for its >> connection. I have ran into such problem always when I attempted to run >> such importing progress with the new HBase versions (even 0.90.1). Can >> anybody address this problem? Does other have similar problems? >> >> I can provide further info about the configuration if needed. >> >> Stan Barton >> >> >> >> >> ajay.gov wrote: >>> Sorry, my server config was not attached. Its here: >>> http://pastebin.com/U41QZGiq >>> >>> thanks >>> -ajay >>> >>> >>> >>> ajay.gov wrote: >>>> I am doing a load test for which I need to load a table with many rows. >>>> I have a small java program that has a for loop and calls HTable.put. I >>>> am inserting a map of 2 items into a table that has one column family. >>>> The limit of the for loop is currently 20000. However after 15876 rows >>>> the call to Put hangs. I am using autoFlush on the HTable. Any ideas why >>>> this may happen? >>>> >>>> The table configuration: >>>> DESCRIPTION ENABLED >>>> {NAME => 'TABLE2', FAMILIES => [{NAME => 'TABLE2_CF true >>>> 1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0' >>>> , COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2 >>>> 147483647', BLOCKSIZE => '65536', IN_MEMORY => 'fal >>>> se', BLOCKCACHE => 'true'}]} >>>> >>>> The HBase config on the client is the one in the hbase-default.xml. Some >>>> values: >>>> hbase.client.write.buffer=2097152 >>>> hbase.client.pause=1000 >>>> hbase.client.retries.number=10 >>>> >>>> If i use another client I am able to put items to the table. I am also >>>> able to scan items from the table using the hbase shell. >>>> >>>> I have attached the server configuratio >>>> I don't see anything in the region server or master logs. I have them >>>> here. >>>> >>>> The master server log: >>>> 2011-04-06 19:02:40,149 INFO org.apache.hadoop.hbase.master.BaseScanner: >>>> RegionManager.rootScanner scanning meta region {server: >>>> 184.106.69.238:60020, regionname: -ROOT-,,0.70236052, startKey:<>} >>>> 2011-04-06 19:02:40,152 INFO org.apache.hadoop.hbase.master.BaseScanner: >>>> RegionManager.rootScanner scan of 1 row(s) of meta region {server: >>>> 184.106.69.238:60020, regionname: -ROOT-,,0.70236052, startKey:<>} >>>> complete >>>> 2011-04-06 19:02:40,157 INFO >>>> org.apache.hadoop.hbase.master.ServerManager: 1 region servers, 0 dead, >>>> average load 42.0 >>>> 2011-04-06 19:03:15,252 INFO org.apache.hadoop.hbase.master.BaseScanner: >>>> RegionManager.metaScanner scanning meta region {server: >>>> 184.106.69.238:60020, regionname: .META.,,1.1028785192, startKey:<>} >>>> 2011-04-06 19:03:15,265 INFO org.apache.hadoop.hbase.master.BaseScanner: Jeff Whiting Qualtrics Senior Software Engineer [EMAIL PROTECTED]
-
Re: HTable.put hangs on bulk loadingStack 2011-04-26, 18:41
On Tue, Apr 26, 2011 at 11:36 AM, Jeff Whiting <[EMAIL PROTECTED]> wrote:
> Would it make sense to do some kind of sanity check on these various > configuration parameters when a region or master server starts? It seems > like there is a lot of them and when they aren't right, it can cause big > problems. Just have it check the configuration parameters and output a > warning in the log. E.g. log.warn("Warning, OS setting may be too low: > ulimit should be at least X. Considering changing it."); > > Also if that were to show up in the Master.jsp it would be even better. > We output ulimit seen by HBase as first thing in the log (We recently changed it so it lists all ulimits so we now show nprocs). But obviously, this is not enough given the frequency with which these configs are missed though they are first thing in our requirements list. Lets add a warning to the UI as you suggest Jeff (We have ones already for when you are running on jdk 1.6.0 u18 -- we can extend this code). Mind filing an issue Jeff? Mark it against 0.92.0 though no reason it shouldn't be added to 0.90.3. Thanks, St.Ack
-
Re: HTable.put hangs on bulk loadingStan Barton 2011-04-27, 09:30
Hi, what means increase? I checked on the client machines and the nproc limit is around 26k, that seems to be as sufficient. The same limit applies on the db machines... Stan ajay.gov wrote: > > Hi, > > I posted the same message on the [EMAIL PROTECTED] mailing list and > Jean-Daniel Cryans suggested i increase the nproc limit on the client > machines. I did it and it fixed the problem. > > -ajay > > > > Stan Barton wrote: >> >> I am running into similar problem with HBase 0.90.2. My setting is 6 RSs >> one master server, 3 ZK servers and about 20 clients (on different >> servers) feeding the database with inserts. After few hours and around >> 2.5 millions rows inserted the process simply hangs with no what so ever >> error indication (neither by ZK, master, regionservers nor the clients). >> All the inserting clients stop virtually at the same moment, but HBase it >> self is not down and can be queried. >> >> In fact, the clients not even crash on time out exception for its >> connection. I have ran into such problem always when I attempted to run >> such importing progress with the new HBase versions (even 0.90.1). Can >> anybody address this problem? Does other have similar problems? >> >> I can provide further info about the configuration if needed. >> >> Stan Barton >> >> >> >> >> ajay.gov wrote: >>> >>> Sorry, my server config was not attached. Its here: >>> http://pastebin.com/U41QZGiq >>> >>> thanks >>> -ajay >>> >>> >>> >>> ajay.gov wrote: >>>> >>>> I am doing a load test for which I need to load a table with many rows. >>>> I have a small java program that has a for loop and calls HTable.put. >>>> I am inserting a map of 2 items into a table that has one column >>>> family. The limit of the for loop is currently 20000. However after >>>> 15876 rows the call to Put hangs. I am using autoFlush on the HTable. >>>> Any ideas why this may happen? >>>> >>>> The table configuration: >>>> DESCRIPTION ENABLED >>>> {NAME => 'TABLE2', FAMILIES => [{NAME => 'TABLE2_CF true >>>> 1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0' >>>> , COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2 >>>> 147483647', BLOCKSIZE => '65536', IN_MEMORY => 'fal >>>> se', BLOCKCACHE => 'true'}]} >>>> >>>> The HBase config on the client is the one in the hbase-default.xml. >>>> Some values: >>>> hbase.client.write.buffer=2097152 >>>> hbase.client.pause=1000 >>>> hbase.client.retries.number=10 >>>> >>>> If i use another client I am able to put items to the table. I am also >>>> able to scan items from the table using the hbase shell. >>>> >>>> I have attached the server configuratio >>>> I don't see anything in the region server or master logs. I have them >>>> here. >>>> >>>> The master server log: >>>> 2011-04-06 19:02:40,149 INFO >>>> org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner >>>> scanning meta region {server: 184.106.69.238:60020, regionname: >>>> -ROOT-,,0.70236052, startKey: <>} >>>> 2011-04-06 19:02:40,152 INFO >>>> org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner >>>> scan of 1 row(s) of meta region {server: 184.106.69.238:60020, >>>> regionname: -ROOT-,,0.70236052, startKey: <>} complete >>>> 2011-04-06 19:02:40,157 INFO >>>> org.apache.hadoop.hbase.master.ServerManager: 1 region servers, 0 dead, >>>> average load 42.0 >>>> 2011-04-06 19:03:15,252 INFO >>>> org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner >>>> scanning meta region {server: 184.106.69.238:60020, regionname: >>>> .META.,,1.1028785192, startKey: <>} >>>> 2011-04-06 19:03:15,265 INFO >>>> org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner >>>> scan of 40 row(s) of meta region {server: 184.106.69.238:60020, >>>> regionname: .META.,,1.1028785192, startKey: <>} complete >>>> 2011-04-06 19:03:15,266 INFO View this message in context: http://old.nabble.com/HTable.put-hangs-on-bulk-loading-tp31338874p31485864.html Sent from the HBase User mailing list archive at Nabble.com.
-
Re: HTable.put hangs on bulk loadingStack 2011-04-27, 16:40
On Wed, Apr 27, 2011 at 2:30 AM, Stan Barton <[EMAIL PROTECTED]> wrote:
> > Hi, > > what means increase? I checked on the client machines and the nproc limit is > around 26k, that seems to be as sufficient. The same limit applies on the db > machines... > The nproc and ulimits are 26k for the user who is running the hadoop/hbase processes? You checked the .out files? You can pastebin your configuration and we'll take a look at them. Sounds like the hang is in the client if you can still get to the cluster from a new shell.... As Mike says, tell us more about your context. How many regions on each server. What is your payload like? Thanks, St.Ack
-
Re: HTable.put hangs on bulk loadingStan Barton 2011-04-28, 13:54
Yes, these high limits are for the user running the hadoop/hbase processes. The systems are ran on a cluster of 7 machines (1 master, 6 slaves). One processor, two cores and 3.5GB of memory. I am using about 800MB for hadoop (version CDH3B2) and 2.1GB for HBase (version 0.90.2). There is 6TB on four disks per machine. Three zookeepers. The database contains more than 3500 regions and the table that was fed was already about 300 regions. The table was fed incrementally using HTable.put(). The data are documents with size ranging from few bytes to megabytes where the upper limit is set to 10MB per inserted doc. The configuration files: hadoop/core-site.xml http://pastebin.ca/2051527 hadoop/hadoop-env.sh http://pastebin.ca/2051528 hadoop/hdfs-site.xml http://pastebin.ca/2051529 hbase/hbase-site.xml http://pastebin.ca/2051532 hbase/hbase-env.sh http://pastebin.ca/2051535 Because the nproc was high I had inspected the out files of the RSs' and found one which indicated that all the IPCs OOMEd, unfortunately I dont have those because they got overwritten after a cluster restart. So that means that it was OK on the client side. Funny is that all RS processes were up and running, only that one with OOMEd IPCs did not really communicate (after trying to restart the importing process no inserts went through). So the cluster seemed OK - I was storing statistics that were apparently served by another RS and those were also listed OK. As I mentioned, the log of the bad RS did not mention that anything wrong happened. My observation was: the regions were spread on all RSs but the crashed RS served most of them about a half more than any other, therefore was accessed the more than others. I have discussed the load balancing in HBase 0.90.2 with Ted Yu already. The balancer needs to be tuned I guess because when the table is created and loaded from scratch, the regions of the table are not balanced equally (in terms of numbers) in the cluster and I guess the RS that hosted the very first region is serving the majority of servers as they are being split. It imposes larger load on that RS which is more prone to failures (like mine OOME) and kill performance. I have resumed the process with rebalancing the regions beforehand and was achieving higher data ingestion rate and also did not ran into the OOME with one RS. Right now I am trying to replay the incident. I know that my scenario would require better machines, but those are what I have now and am before production running stress tests. In comparison with 0.20.6 the 0.90.2 is less stable regarding the insertion but it scales sub-linearily (v0.20.6 did not scale on my data) in terms of random access queries (including multi-versioned data) - have done extensive comparison regarding this. Stan stack-3 wrote: > > On Wed, Apr 27, 2011 at 2:30 AM, Stan Barton <[EMAIL PROTECTED]> wrote: >> >> Hi, >> >> what means increase? I checked on the client machines and the nproc limit >> is >> around 26k, that seems to be as sufficient. The same limit applies on the >> db >> machines... >> > > The nproc and ulimits are 26k for the user who is running the > hadoop/hbase processes? > > You checked the .out files? You can pastebin your configuration and > we'll take a look at them. > > Sounds like the hang is in the client if you can still get to the > cluster from a new shell.... > > As Mike says, tell us more about your context. How many regions on > each server. What is your payload like? > > Thanks, > St.Ack > > -- View this message in context: http://old.nabble.com/HTable.put-hangs-on-bulk-loading-tp31338874p31496726.html Sent from the HBase User mailing list archive at Nabble.com.
-
Re: HTable.put hangs on bulk loadingStack 2011-05-07, 21:51
On Thu, Apr 28, 2011 at 6:54 AM, Stan Barton <[EMAIL PROTECTED]> wrote:
> > Yes, these high limits are for the user running the hadoop/hbase processes. > > The systems are ran on a cluster of 7 machines (1 master, 6 slaves). One > processor, two cores and 3.5GB of memory. I am using about 800MB for hadoop > (version CDH3B2) and 2.1GB for HBase (version 0.90.2). There is 6TB on four > disks per machine. Three zookeepers. The database contains more than 3500 > regions and the table that was fed was already about 300 regions. The table > was fed incrementally using HTable.put(). The data are documents with size > ranging from few bytes to megabytes where the upper limit is set to 10MB per > inserted doc. > Are you swapping Stan? You are close to the edge with your RAM allocations. What do you have swappyness set to? Is it default? Writing you don't need that much memory usually but you do have a lot of regions so you could be flushing a bunch, a bunch of small files. > The configuration files: > > hadoop/core-site.xml http://pastebin.ca/2051527 > hadoop/hadoop-env.sh http://pastebin.ca/2051528 Your HADOOP_CLASSPATH is a little odd. You are doing * on jar directories. Does that work? This CLASSPATH mentions nutch and a bunch of other stuff. Are you running just datanodes on these machines or tasktracers and mapreduce too? These are old IA stock machines? Do they have ECC RAM? (IIRC, they used to not have ECC RAM). > hadoop/hdfs-site.xml http://pastebin.ca/2051529 > Did you change the dfs block size? Looks like its 256M rather than usual 64M. Any reason for that? Would suggest going w/ defaults at first. Remove dfs.datanode.socket.write.timeout == 0. Thats an old config. recommendation that should no longer be necessary and is likely corrosive. > hbase/hbase-site.xml http://pastebin.ca/2051532 You are letting major compactions run every 24 hours. You might want to turn them off and then manage the major compactions to happen during downtimes. They have a knack of cutting in just when you don't want them too; e.g. when you are under peak loading. You have upped the flush size above default; i.e. hbase.hregion.memstore.flush.size. This will put more pressure on RAM when I'd think you would want to have less since you are poor where it comes to RAM. You have upped your regionsize above default. That is good I'd say. You might want to 4x -- go to 4G -- since you are doing relatively big stuff. You should send me the metrics that show on the top of the regionserver UI when you are under load. I'd like to see things like how much of your RAM is given over to indices for these storefiles. I see you hdfs.block.size specified in here at 256M. So stuff written by hbase into hdfs will have a block size of 256M. Any reason to do this? I'd say leave it at default unless you have a really good reason to do otherwise (Remove this config. from this file). > hbase/hbase-env.sh http://pastebin.ca/2051535 > Remove this: -XX:+HeapDumpOnOutOfMemoryError Means it will dump heap if JVM crashes. This is probably of no interest to you and could actually cause you pain if you have small root file system if the heap dump causes you to fill. The -XX:CMSInitiatingOccupancyFraction=90 is probably near useless (default is 92% or 88% -- I don't remember which). Set it down to 80% or 75% if you want it to actually make a difference. Are you having issues w/ GC'ing? I see you have mslab enabled. > Because the nproc was high I had inspected the out files of the RSs' and > found one which indicated that all the IPCs OOMEd, unfortunately I dont have > those because they got overwritten after a cluster restart. This may have been because of HBASE-3813 . See if 0.90.3 helps (There is an improvement here). Next time, let us see them. > So that means > that it was OK on the client side. Funny is that all RS processes were up > and running, only that one with OOMEd IPCs did not really communicate (after > trying to restart the importing process no inserts went through). An OOME'd process goes wonky thereafter and acts in irrational ways. Perhaps this was why it stopped taking on requests. OK. Yeah, Ted has been arguing that the balancer should be table-conscious in that it should try and spread tables across the cluster. Currently its not. All regions are equal in the balancer's eyes. 0.90.2 didn't help? (Others have since reported that they think as Ted that the region a table comes from should be considered when balancing). 0.90.2 scales sub-linearly? You mean its not linear but more machines helps? Are you random reads truly random (I'd guess they are). Would cache help? Have you tried hdfs-237 patch? There is a version that should work for you version of cdh. It could a big difference (though, beware, the latest posted patch does not do checksuming and if your hardware does not have ECC, it could be a problem). St.Ack
-
Re: HTable.put hangs on bulk loadingStan Barton 2011-05-13, 14:44
stack-3 wrote: > > On Thu, Apr 28, 2011 at 6:54 AM, Stan Barton <[EMAIL PROTECTED]> wrote: >> >> Yes, these high limits are for the user running the hadoop/hbase >> processes. >> >> The systems are ran on a cluster of 7 machines (1 master, 6 slaves). One >> processor, two cores and 3.5GB of memory. I am using about 800MB for >> hadoop >> (version CDH3B2) and 2.1GB for HBase (version 0.90.2). There is 6TB on >> four >> disks per machine. Three zookeepers. The database contains more than 3500 >> regions and the table that was fed was already about 300 regions. The >> table >> was fed incrementally using HTable.put(). The data are documents with >> size >> ranging from few bytes to megabytes where the upper limit is set to 10MB >> per >> inserted doc. >> > > Are you swapping Stan? You are close to the edge with your RAM > allocations. What do you have swappyness set to? Is it default? > > Writing you don't need that much memory usually but you do have a lot > of regions so you could be flushing a bunch, a bunch of small files. > Due to various problems with swap, the swap was turned off and the overcommitment of the memory was turned on. stack-3 wrote: > >> The configuration files: >> >> hadoop/core-site.xml http://pastebin.ca/2051527 >> hadoop/hadoop-env.sh http://pastebin.ca/2051528 > > Your HADOOP_CLASSPATH is a little odd. You are doing * on jar > directories. Does that work? > > This CLASSPATH mentions nutch and a bunch of other stuff. Are you > running just datanodes on these machines or tasktracers and mapreduce > too? > > These are old IA stock machines? Do they have ECC RAM? (IIRC, they > used to not have ECC RAM). > Strangely, on the machines and the debian installed, only this (star * ) approach works. Originally, I was running the DB on the same cluster as the processing took place - mostly mapreduce jobs reading the data and doing some analysis. But when I started using nutchwax on the same cluster I started running out of memory (on the mapreduce side) and since the machines are so sensitive (no swap and overcommitment) that became a nightmare. So right now the nutch is being ran on a separate cluster - I have tweaked nutchwax to work with recent Hadoop apis and also to take the hbase stored content on as the input (instead of ARC files). The machines are somehow renovated old red boxes (I dont know what configuration they were originally). The RAM is not an ECC as far as I know, because the chipset on the motherboards does not support that technology. stack-3 wrote: > >> hadoop/hdfs-site.xml http://pastebin.ca/2051529 >> > > Did you change the dfs block size? Looks like its 256M rather than > usual 64M. Any reason for that? Would suggest going w/ defaults at > first. > > Remove dfs.datanode.socket.write.timeout == 0. Thats an old config. > recommendation that should no longer be necessary and is likely > corrosive. > I have changed the size of the block, to diminish the overall number of blocks. I was following some advices regarding managing that large amount of data in HDFS that I found in the fora. As for the dfs.datanode.socket.write.timeout, that was set up because I was observing quite often timeouts on the DFS sockets, and by digging around, I have found out, that for some reason the internal java times were not aligned of the connecting machines (even though the hw clock were), I think there was a JIRA for that. stack-3 wrote: > >> hbase/hbase-site.xml http://pastebin.ca/2051532 > > You are letting major compactions run every 24 hours. You might want > to turn them off and then manage the major compactions to happen > during downtimes. They have a knack of cutting in just when you don't > want them too; e.g. when you are under peak loading. > > You have upped the flush size above default; i.e. > hbase.hregion.memstore.flush.size. This will put more pressure on RAM > when I'd think you would want to have less since you are poor where it > comes to RAM. > > You have upped your regionsize above default. That is good I'd say. Again, the reason to upper the block size was motivated by the assumption of lowering the overall number of blocks. If it imposes stress on the RAM it makes sense to leave it on the defaults. I guess it also helps the parallelization. stack-3 wrote: On the version 0.20.6 I have seen long pauses during the importing phase and also when querying. I was measuring the how many queries were processed per second and could see pauses in the throughput. The only culprit I could find was the gc, but still could not figure out why it pauses the whole DB. Therefore I gave it a shot with mslab with 0.90, but I do still see those pauses in the throughput. stack-3 wrote: In fact, what I still see in 0.90.2 is that when I start inserting to empty table, when the number of the regions in this table rises (more than 100), they are spread all over the cluster (good) but one RS (the holding the first region) serves remarkably more regions than the rest of the RSs, which kills performance of the whole cluster and puts a lot of stress on this one RS (there was no RS downtime, and the overall region numbers are even on all RS). stack-3 wrote: I have done some test using random access queries and multiversioned data (10 to 50 different timestamps per data) and that the random access in v0.20.6 is degrading linearly with the number of versions, in the case of 0.90, some slow down was recorded but in sub-linear speed. Still while using the same amount of machines. The reads were random, I pre-selected the rows from the whole collection. The cache helped, I could see in the pattern the time it took to serve a query's answer from disk and from cache. Are you sure that you have suggested the right patch (hdfs-237)? It mentions dfsadmin... And no the machines do not have ECC enabled ram. stack-3 wrote: Stan View this message in context: http://old.nabble.com/HTable.put-hangs-on-bulk-loading-tp31338874p31612028.ht
-
Re: HTable.put hangs on bulk loadingTed Yu 2011-05-13, 15:04
>> when the number of the regions in this table rises (more than 100), they
are spread all over the cluster (good) Can you clarify the above a bit more ? If you use stock version 0.90.2, random selector wouldn't guarantee to distribute the regions of this table. On Fri, May 13, 2011 at 7:44 AM, Stan Barton <[EMAIL PROTECTED]> wrote: > > > stack-3 wrote: > > > > On Thu, Apr 28, 2011 at 6:54 AM, Stan Barton <[EMAIL PROTECTED]> wrote: > >> > >> Yes, these high limits are for the user running the hadoop/hbase > >> processes. > >> > >> The systems are ran on a cluster of 7 machines (1 master, 6 slaves). One > >> processor, two cores and 3.5GB of memory. I am using about 800MB for > >> hadoop > >> (version CDH3B2) and 2.1GB for HBase (version 0.90.2). There is 6TB on > >> four > >> disks per machine. Three zookeepers. The database contains more than > 3500 > >> regions and the table that was fed was already about 300 regions. The > >> table > >> was fed incrementally using HTable.put(). The data are documents with > >> size > >> ranging from few bytes to megabytes where the upper limit is set to 10MB > >> per > >> inserted doc. > >> > > > > Are you swapping Stan? You are close to the edge with your RAM > > allocations. What do you have swappyness set to? Is it default? > > > > Writing you don't need that much memory usually but you do have a lot > > of regions so you could be flushing a bunch, a bunch of small files. > > > > Due to various problems with swap, the swap was turned off and the > overcommitment of the memory was turned on. > > > stack-3 wrote: > > > >> The configuration files: > >> > >> hadoop/core-site.xml http://pastebin.ca/2051527 > >> hadoop/hadoop-env.sh http://pastebin.ca/2051528 > > > > Your HADOOP_CLASSPATH is a little odd. You are doing * on jar > > directories. Does that work? > > > > This CLASSPATH mentions nutch and a bunch of other stuff. Are you > > running just datanodes on these machines or tasktracers and mapreduce > > too? > > > > These are old IA stock machines? Do they have ECC RAM? (IIRC, they > > used to not have ECC RAM). > > > > Strangely, on the machines and the debian installed, only this (star * ) > approach works. Originally, I was running the DB on the same cluster as the > processing took place - mostly mapreduce jobs reading the data and doing > some analysis. But when I started using nutchwax on the same cluster I > started running out of memory (on the mapreduce side) and since the > machines > are so sensitive (no swap and overcommitment) that became a nightmare. So > right now the nutch is being ran on a separate cluster - I have tweaked > nutchwax to work with recent Hadoop apis and also to take the hbase stored > content on as the input (instead of ARC files). > > The machines are somehow renovated old red boxes (I dont know what > configuration they were originally). The RAM is not an ECC as far as I > know, > because the chipset on the motherboards does not support that technology. > > > stack-3 wrote: > > > >> hadoop/hdfs-site.xml http://pastebin.ca/2051529 > >> > > > > Did you change the dfs block size? Looks like its 256M rather than > > usual 64M. Any reason for that? Would suggest going w/ defaults at > > first. > > > > Remove dfs.datanode.socket.write.timeout == 0. Thats an old config. > > recommendation that should no longer be necessary and is likely > > corrosive. > > > > I have changed the size of the block, to diminish the overall number of > blocks. I was following some advices regarding managing that large amount > of > data in HDFS that I found in the fora. > > As for the dfs.datanode.socket.write.timeout, that was set up because I was > observing quite often timeouts on the DFS sockets, and by digging around, I > have found out, that for some reason the internal java times were not > aligned of the connecting machines (even though the hw clock were), I think > there was a JIRA for that. > > > stack-3 wrote: > > > >> hbase/hbase-site.xml http://pastebin.ca/2051532
-
Re: HTable.put hangs on bulk loadingStack 2011-05-13, 19:34
On Fri, May 13, 2011 at 7:44 AM, Stan Barton <[EMAIL PROTECTED]> wrote:
> stack-3 wrote: >> >> On Thu, Apr 28, 2011 at 6:54 AM, Stan Barton <[EMAIL PROTECTED]> wrote: >> Are you swapping Stan? You are close to the edge with your RAM >> allocations. What do you have swappyness set to? Is it default? >> >> Writing you don't need that much memory usually but you do have a lot >> of regions so you could be flushing a bunch, a bunch of small files. >> > > Due to various problems with swap, the swap was turned off and the > overcommitment of the memory was turned on. > Sorry. How do you enable overcommitment of memory, or do you mean to say that your processes add up to more than the RAM you have? > stack-3 wrote: >> These are old IA stock machines? Do they have ECC RAM? (IIRC, they >> used to not have ECC RAM). >> > > Strangely, on the machines and the debian installed, only this (star * ) > approach works. OK. New to me, but hey, what do I know! > Originally, I was running the DB on the same cluster as the > processing took place - mostly mapreduce jobs reading the data and doing > some analysis. But when I started using nutchwax on the same cluster I > started running out of memory (on the mapreduce side) and since the machines > are so sensitive (no swap and overcommitment) that became a nightmare. So > right now the nutch is being ran on a separate cluster - I have tweaked > nutchwax to work with recent Hadoop apis and also to take the hbase stored > content on as the input (instead of ARC files). > Good stuff > The machines are somehow renovated old red boxes (I dont know what > configuration they were originally). The RAM is not an ECC as far as I know, > because the chipset on the motherboards does not support that technology. > OK. You seeing any issues arising because of checksum issues? (BTW, IIRC, these non-ECC red boxes are the reason HDFS is a checksummed filesystem) > stack-3 wrote: >> >>> hadoop/hdfs-site.xml http://pastebin.ca/2051529 >>> >> >> Did you change the dfs block size? Looks like its 256M rather than >> usual 64M. Any reason for that? Would suggest going w/ defaults at >> first. >> >> Remove dfs.datanode.socket.write.timeout == 0. Thats an old config. >> recommendation that should no longer be necessary and is likely >> corrosive. >> > > I have changed the size of the block, to diminish the overall number of > blocks. I was following some advices regarding managing that large amount of > data in HDFS that I found in the fora. > Yeah, I suppose, bigger blocksizes would make it so you need less RAM in your namenode. You have lots of files on here? On the other side, bigger blocks are harder for hbase to sling. > As for the dfs.datanode.socket.write.timeout, that was set up because I was > observing quite often timeouts on the DFS sockets, and by digging around, I > have found out, that for some reason the internal java times were not > aligned of the connecting machines (even though the hw clock were), I think > there was a JIRA for that. > Not sure what this one is about. The dfs.datanode.socket.write.timeout=0 is old lore by this stage I think you'll find. > > Again, the reason to upper the block size was motivated by the assumption of > lowering the overall number of blocks. If it imposes stress on the RAM it > makes sense to leave it on the defaults. I guess it also helps the > parallelization. > Yeah, would suggest you run w/ default sizes. > stack-3 wrote: >> >> >> >>> hbase/hbase-env.sh http://pastebin.ca/2051535 >>> >> >> Remove this: >> >> -XX:+HeapDumpOnOutOfMemoryError >> >> Means it will dump heap if JVM crashes. This is probably of no >> interest to you and could actually cause you pain if you have small >> root file system if the heap dump causes you to fill. >> >> The -XX:CMSInitiatingOccupancyFraction=90 is probably near useless >> (default is 92% or 88% -- I don't remember which). Set it down to 80% >> or 75% if you want it to actually make a difference. Importing, yeah, you are probably running into the 'gate' that a regionserver puts up when it has filled its memstore while waiting on flush to complete. Check regionserver logs at about this time. You should see 'blocking' messages followed soon after by unblocking after the flush runs. St.Ack
-
Re: HTable.put hangs on bulk loadingStan Barton 2011-05-16, 11:55
stack-3 wrote: > > On Fri, May 13, 2011 at 7:44 AM, Stan Barton <[EMAIL PROTECTED]> wrote: >> stack-3 wrote: >>> >>> On Thu, Apr 28, 2011 at 6:54 AM, Stan Barton <[EMAIL PROTECTED]> wrote: >>> Are you swapping Stan? You are close to the edge with your RAM >>> allocations. What do you have swappyness set to? Is it default? >>> >>> Writing you don't need that much memory usually but you do have a lot >>> of regions so you could be flushing a bunch, a bunch of small files. >>> >> >> Due to various problems with swap, the swap was turned off and the >> overcommitment of the memory was turned on. >> > > Sorry. How do you enable overcommitment of memory, or do you mean to > say that your processes add up to more than the RAM you have? > The memory overcommitment is needed because in order to let java still "allocate" the memory for executing external bash commands like "du" when the RAM is nearly filled up. I have the swap turned off and have turned the overcommitment using sysctl and setting vm.overcommit_memory=0 (i.e. the option when any memory allocation attempt will succeed no matter the resting free RAM). I was encountering RS crashed caused by the "java.io.IOException: Cannot run program "bash": java.io.IOException: error=12, Cannot allocate memory". However, my processes should never add up more than the available RAM-the minimum for OS. stack-3 wrote: > >> stack-3 wrote: >>> These are old IA stock machines? Do they have ECC RAM? (IIRC, they >>> used to not have ECC RAM). >>> >> >> Strangely, on the machines and the debian installed, only this (star * ) >> approach works. > > OK. New to me, but hey, what do I know! > > >> Originally, I was running the DB on the same cluster as the >> processing took place - mostly mapreduce jobs reading the data and doing >> some analysis. But when I started using nutchwax on the same cluster I >> started running out of memory (on the mapreduce side) and since the >> machines >> are so sensitive (no swap and overcommitment) that became a nightmare. So >> right now the nutch is being ran on a separate cluster - I have tweaked >> nutchwax to work with recent Hadoop apis and also to take the hbase >> stored >> content on as the input (instead of ARC files). >> > > Good stuff > >> The machines are somehow renovated old red boxes (I dont know what >> configuration they were originally). The RAM is not an ECC as far as I >> know, >> because the chipset on the motherboards does not support that technology. >> > > OK. You seeing any issues arising because of checksum issues? (BTW, > IIRC, these non-ECC red boxes are the reason HDFS is a checksummed > filesystem) > > How would these manifest? I guess that is not related but on the same note, I am encountering a quite high disk failure on machines running HBase/HDFS. stack-3 wrote: > > >> stack-3 wrote: >>> >>>> hadoop/hdfs-site.xml http://pastebin.ca/2051529 >>>> >>> >>> Did you change the dfs block size? Looks like its 256M rather than >>> usual 64M. Any reason for that? Would suggest going w/ defaults at >>> first. >>> >>> Remove dfs.datanode.socket.write.timeout == 0. Thats an old config. >>> recommendation that should no longer be necessary and is likely >>> corrosive. >>> >> >> I have changed the size of the block, to diminish the overall number of >> blocks. I was following some advices regarding managing that large amount >> of >> data in HDFS that I found in the fora. >> > > Yeah, I suppose, bigger blocksizes would make it so you need less RAM > in your namenode. You have lots of files on here? On the other side, > bigger blocks are harder for hbase to sling. > > In general, the HDFS contains only HBase files, so at this point the memory consumption on NN is not an issue, so I have lowered that back to the defaults and will observe. stack-3 wrote: > > >> As for the dfs.datanode.socket.write.timeout, that was set up because I >> was >> observing quite often timeouts on the DFS sockets, and by digging around, I should have noted the cause of the problem better, I will remove that and observe whether will be getting the socket exceptions again. stack-3 wrote: For the import I can understand, but when I am evaluating the querying performance, almost no writes (besides small statistics data) are going on and the HBase pauses as a whole, not only one RS (which I would believe is the case when writes were flushed in the statistics table having one region). stack-3 wrote: View this message in context: http://old.nabble.com/HTable.put-hangs-on-bulk-loading-tp31338874p31628188.html Sent from the HBase User mailing list archive at Nabble.com.
-
Re: HTable.put hangs on bulk loadingStack 2011-05-16, 18:30
On Mon, May 16, 2011 at 4:55 AM, Stan Barton <[EMAIL PROTECTED]> wrote:
>> Sorry. How do you enable overcommitment of memory, or do you mean to >> say that your processes add up to more than the RAM you have? >> > > The memory overcommitment is needed because in order to let java still > "allocate" the memory for executing external bash commands like "du" when > the RAM is nearly filled up. I have the swap turned off and have turned the > overcommitment using sysctl and setting vm.overcommit_memory=0 (i.e. the > option when any memory allocation attempt will succeed no matter the resting > free RAM). I was encountering RS crashed caused by the "java.io.IOException: > Cannot run program "bash": java.io.IOException: error=12, Cannot allocate > memory". However, my processes should never add up more than the available > RAM-the minimum for OS. > If it happens again, can I see stack trace for the above? > How would these manifest? I guess that is not related but on the same note, > I am encountering a quite high disk failure on machines running HBase/HDFS. > If all worked as designed, you'd not see anything. A corrupted block would be put aside and a new replica made from a good replica would take its place. But IIRC, corruption rate was really high on these machines. Do you ever run into files missing blocks? Yeah, the disks were cheapies. Any chance of different hardware? > In general, the HDFS contains only HBase files, so at this point the memory > consumption on NN is not an issue, so I have lowered that back to the > defaults and will observe. > Yeah, this is probably better. You are now like most others on this list. > For the import I can understand, but when I am evaluating the querying > performance, almost no writes (besides small statistics data) are going on > and the HBase pauses as a whole, not only one RS (which I would believe is > the case when writes were flushed in the statistics table having one > region). > Lets try and dig in on this. This shouldn't be happening. Anything in regionserver logs at the time of the pause? St.Ack
-
Re: HTable.put hangs on bulk loadingStan Barton 2011-05-18, 12:59
stack-3 wrote: > > On Mon, May 16, 2011 at 4:55 AM, Stan Barton <[EMAIL PROTECTED]> wrote: >>> Sorry. How do you enable overcommitment of memory, or do you mean to >>> say that your processes add up to more than the RAM you have? >>> >> >> The memory overcommitment is needed because in order to let java still >> "allocate" the memory for executing external bash commands like "du" when >> the RAM is nearly filled up. I have the swap turned off and have turned >> the >> overcommitment using sysctl and setting vm.overcommit_memory=0 (i.e. the >> option when any memory allocation attempt will succeed no matter the >> resting >> free RAM). I was encountering RS crashed caused by the >> "java.io.IOException: >> Cannot run program "bash": java.io.IOException: error=12, Cannot allocate >> memory". However, my processes should never add up more than the >> available >> RAM-the minimum for OS. >> > If it happens again, can I see stack trace for the above? > > >> How would these manifest? I guess that is not related but on the same >> note, >> I am encountering a quite high disk failure on machines running >> HBase/HDFS. >> > > > If all worked as designed, you'd not see anything. A corrupted block > would be put aside and a new replica made from a good replica would > take its place. But IIRC, corruption rate was really high on these > machines. Do you ever run into files missing blocks? > > Yeah, the disks were cheapies. > > Any chance of different hardware? > Yes, the HW changed I guess, the disks right now are 4xSATA seagates 1.5TB and the RAM was added (3.5GB total) and the motherboards are supermicro x7sla. The disks are still no enterprise entry level 7.2k rpm. Sometimes I am running into block not available when doing distCp, but the block is there, it is ok but is just not available for some reason - when doing exactly the same copy second time, the errors appear on different blocks. stack-3 wrote: > >> In general, the HDFS contains only HBase files, so at this point the >> memory >> consumption on NN is not an issue, so I have lowered that back to the >> defaults and will observe. >> > > Yeah, this is probably better. You are now like most others on this list. > >> For the import I can understand, but when I am evaluating the querying >> performance, almost no writes (besides small statistics data) are going >> on >> and the HBase pauses as a whole, not only one RS (which I would believe >> is >> the case when writes were flushed in the statistics table having one >> region). >> > > Lets try and dig in on this. This shouldn't be happening. Anything > in regionserver logs at the time of the pause? > > I replayed the query tests, the query throughput is depicted in the following figure: http://old.nabble.com/file/p31646631/query_throughput_agg.jpg the green line denotes the number of simultaneously querying clients, the red columns for queries processed in the particular second. The x-axis is the number of seconds from start of the experiment. Gaps in the throughput denote the problems. I have looked in the logs and it seems that the culprit is the HDFS: snippets of the logs of RS1 and RS2 where I found errors around one of the gaps: http://pastebin.ca/2063454 I have included also snippets of logs to datanodes to where the RS errors pointed. stack-3 wrote: > > St.Ack > > -- View this message in context: http://old.nabble.com/HTable.put-hangs-on-bulk-loading-tp31338874p31646631.html Sent from the HBase User mailing list archive at Nabble.com. |