|
|
-
fastest way to run tests
Matt Corgan 2012-08-26, 22:57
Hi devs - are there any commands to pass to "mvn test" to get it to run the tests more aggressively. Trying to run it on i7 / 32G / SSD, and only seeing 10 or 20% cpu usage and negligible iowait. I tried "mvn -T 2C test" which is supposed to run 2 threads per core, but not sure it's making a difference.
Maybe there are some other options i don't know about. I know a ton of work has gone into speeding up tests, so please don't read as a criticism!
Thanks, Matt
+
Matt Corgan 2012-08-26, 22:57
-
Re: fastest way to run tests
N Keywal 2012-08-27, 06:01
Hi Matt,
The fastest way to run the tests is to use a ramdrive and to use as many process as possible.
mvn -Dtest.build.data.basedirectory=/ramdrive test -P runAllTests -Dsurefire.secondPartThreadCount=12
=> Dtest.build.data.basedirectory => use the given directory to write the test data sudo mkdir /ramdrive sudo mount -t tmpfs -o size=2000M tmpfs /ramdrive It must be cleaned before running another test.
=> -P runAllTests => run all tests. Without this parameter only small and medium tests are executed
=> -Dsurefire.secondPartThreadCount=12 => execute 12 tests in parallel. Can be increased.
Cheers,
N. On Mon, Aug 27, 2012 at 12:57 AM, Matt Corgan <[EMAIL PROTECTED]> wrote:
> Hi devs - are there any commands to pass to "mvn test" to get it to run the > tests more aggressively. Trying to run it on i7 / 32G / SSD, and only > seeing 10 or 20% cpu usage and negligible iowait. I tried "mvn -T 2C test" > which is supposed to run 2 threads per core, but not sure it's making a > difference. > > Maybe there are some other options i don't know about. I know a ton of > work has gone into speeding up tests, so please don't read as a criticism! > > Thanks, > Matt >
+
N Keywal 2012-08-27, 06:01
-
Re: fastest way to run tests
Jesse Yates 2012-08-27, 06:41
Do we want to add this to the reference guide? I know its something I'd forget... ------------------- Jesse Yates @jesse_yates jyates.github.com On Sun, Aug 26, 2012 at 11:01 PM, N Keywal <[EMAIL PROTECTED]> wrote:
> Hi Matt, > > The fastest way to run the tests is to use a ramdrive and to use as many > process as possible. > > mvn -Dtest.build.data.basedirectory=/ramdrive test -P runAllTests > -Dsurefire.secondPartThreadCount=12 > > => Dtest.build.data.basedirectory => use the given directory to write the > test data > sudo mkdir /ramdrive > sudo mount -t tmpfs -o size=2000M tmpfs /ramdrive > It must be cleaned before running another test. > > => -P runAllTests => run all tests. Without this parameter only small and > medium tests are executed > > => -Dsurefire.secondPartThreadCount=12 => execute 12 tests in parallel. Can > be increased. > > Cheers, > > N. > > > On Mon, Aug 27, 2012 at 12:57 AM, Matt Corgan <[EMAIL PROTECTED]> wrote: > > > Hi devs - are there any commands to pass to "mvn test" to get it to run > the > > tests more aggressively. Trying to run it on i7 / 32G / SSD, and only > > seeing 10 or 20% cpu usage and negligible iowait. I tried "mvn -T 2C > test" > > which is supposed to run 2 threads per core, but not sure it's making a > > difference. > > > > Maybe there are some other options i don't know about. I know a ton of > > work has gone into speeding up tests, so please don't read as a > criticism! > > > > Thanks, > > Matt > > >
+
Jesse Yates 2012-08-27, 06:41
-
Re: fastest way to run tests
Lars George 2012-08-27, 06:51
+1 on adding this.
On Aug 27, 2012, at 8:41, Jesse Yates <[EMAIL PROTECTED]> wrote:
> Do we want to add this to the reference guide? I know its something I'd > forget... > ------------------- > Jesse Yates > @jesse_yates > jyates.github.com > > > On Sun, Aug 26, 2012 at 11:01 PM, N Keywal <[EMAIL PROTECTED]> wrote: > >> Hi Matt, >> >> The fastest way to run the tests is to use a ramdrive and to use as many >> process as possible. >> >> mvn -Dtest.build.data.basedirectory=/ramdrive test -P runAllTests >> -Dsurefire.secondPartThreadCount=12 >> >> => Dtest.build.data.basedirectory => use the given directory to write the >> test data >> sudo mkdir /ramdrive >> sudo mount -t tmpfs -o size=2000M tmpfs /ramdrive >> It must be cleaned before running another test. >> >> => -P runAllTests => run all tests. Without this parameter only small and >> medium tests are executed >> >> => -Dsurefire.secondPartThreadCount=12 => execute 12 tests in parallel. Can >> be increased. >> >> Cheers, >> >> N. >> >> >> On Mon, Aug 27, 2012 at 12:57 AM, Matt Corgan <[EMAIL PROTECTED]> wrote: >> >>> Hi devs - are there any commands to pass to "mvn test" to get it to run >> the >>> tests more aggressively. Trying to run it on i7 / 32G / SSD, and only >>> seeing 10 or 20% cpu usage and negligible iowait. I tried "mvn -T 2C >> test" >>> which is supposed to run 2 threads per core, but not sure it's making a >>> difference. >>> >>> Maybe there are some other options i don't know about. I know a ton of >>> work has gone into speeding up tests, so please don't read as a >> criticism! >>> >>> Thanks, >>> Matt >>> >>
+
Lars George 2012-08-27, 06:51
-
Re: fastest way to run tests
N Keywal 2012-08-27, 07:29
I agree, I should have documented this.
I'm currently working on getting back to the official version of Surefire (HBASE-4955), and I will simplify this as well. For example, renaming "surefire.secondPartThreadCount" to "threadCount" (i.e. the standard name in Surefire). I will update the documentation accordingly. On Mon, Aug 27, 2012 at 8:51 AM, Lars George <[EMAIL PROTECTED]> wrote:
> +1 on adding this. > > On Aug 27, 2012, at 8:41, Jesse Yates <[EMAIL PROTECTED]> wrote: > > > Do we want to add this to the reference guide? I know its something I'd > > forget... > > ------------------- > > Jesse Yates > > @jesse_yates > > jyates.github.com > > > > > > On Sun, Aug 26, 2012 at 11:01 PM, N Keywal <[EMAIL PROTECTED]> wrote: > > > >> Hi Matt, > >> > >> The fastest way to run the tests is to use a ramdrive and to use as many > >> process as possible. > >> > >> mvn -Dtest.build.data.basedirectory=/ramdrive test -P runAllTests > >> -Dsurefire.secondPartThreadCount=12 > >> > >> => Dtest.build.data.basedirectory => use the given directory to write > the > >> test data > >> sudo mkdir /ramdrive > >> sudo mount -t tmpfs -o size=2000M tmpfs /ramdrive > >> It must be cleaned before running another test. > >> > >> => -P runAllTests => run all tests. Without this parameter only small > and > >> medium tests are executed > >> > >> => -Dsurefire.secondPartThreadCount=12 => execute 12 tests in parallel. > Can > >> be increased. > >> > >> Cheers, > >> > >> N. > >> > >> > >> On Mon, Aug 27, 2012 at 12:57 AM, Matt Corgan <[EMAIL PROTECTED]> > wrote: > >> > >>> Hi devs - are there any commands to pass to "mvn test" to get it to run > >> the > >>> tests more aggressively. Trying to run it on i7 / 32G / SSD, and only > >>> seeing 10 or 20% cpu usage and negligible iowait. I tried "mvn -T 2C > >> test" > >>> which is supposed to run 2 threads per core, but not sure it's making a > >>> difference. > >>> > >>> Maybe there are some other options i don't know about. I know a ton of > >>> work has gone into speeding up tests, so please don't read as a > >> criticism! > >>> > >>> Thanks, > >>> Matt > >>> > >> >
+
N Keywal 2012-08-27, 07:29
-
Re: fastest way to run tests
Ted Yu 2012-08-27, 21:16
Thanks N for sharing the knowledge.
Looks like some test needs to use random port(s):
testStopDuringStart(org.apache.hadoop.hbase.master.TestMasterNoCluster): Problem binding to sea-lab-0/10.249.196.101:60000 : Address already in use testFailover(org.apache.hadoop.hbase.master.TestMasterNoCluster): Problem binding to sea-lab-0/10.249.196.101:60000 : Address already in use testCatalogDeploys(org.apache.hadoop.hbase.master.TestMasterNoCluster): Problem binding to sea-lab-0/10.249.196.101:60000 : Address already in use
Cheers
On Mon, Aug 27, 2012 at 12:29 AM, N Keywal <[EMAIL PROTECTED]> wrote:
> I agree, I should have documented this. > > I'm currently working on getting back to the official version of Surefire > (HBASE-4955), and I will simplify this as well. For example, renaming > "surefire.secondPartThreadCount" to "threadCount" (i.e. the standard name > in Surefire). I will update the documentation accordingly. > > > On Mon, Aug 27, 2012 at 8:51 AM, Lars George <[EMAIL PROTECTED]> > wrote: > > > +1 on adding this. > > > > On Aug 27, 2012, at 8:41, Jesse Yates <[EMAIL PROTECTED]> wrote: > > > > > Do we want to add this to the reference guide? I know its something I'd > > > forget... > > > ------------------- > > > Jesse Yates > > > @jesse_yates > > > jyates.github.com > > > > > > > > > On Sun, Aug 26, 2012 at 11:01 PM, N Keywal <[EMAIL PROTECTED]> wrote: > > > > > >> Hi Matt, > > >> > > >> The fastest way to run the tests is to use a ramdrive and to use as > many > > >> process as possible. > > >> > > >> mvn -Dtest.build.data.basedirectory=/ramdrive test -P runAllTests > > >> -Dsurefire.secondPartThreadCount=12 > > >> > > >> => Dtest.build.data.basedirectory => use the given directory to write > > the > > >> test data > > >> sudo mkdir /ramdrive > > >> sudo mount -t tmpfs -o size=2000M tmpfs /ramdrive > > >> It must be cleaned before running another test. > > >> > > >> => -P runAllTests => run all tests. Without this parameter only small > > and > > >> medium tests are executed > > >> > > >> => -Dsurefire.secondPartThreadCount=12 => execute 12 tests in > parallel. > > Can > > >> be increased. > > >> > > >> Cheers, > > >> > > >> N. > > >> > > >> > > >> On Mon, Aug 27, 2012 at 12:57 AM, Matt Corgan <[EMAIL PROTECTED]> > > wrote: > > >> > > >>> Hi devs - are there any commands to pass to "mvn test" to get it to > run > > >> the > > >>> tests more aggressively. Trying to run it on i7 / 32G / SSD, and > only > > >>> seeing 10 or 20% cpu usage and negligible iowait. I tried "mvn -T 2C > > >> test" > > >>> which is supposed to run 2 threads per core, but not sure it's > making a > > >>> difference. > > >>> > > >>> Maybe there are some other options i don't know about. I know a ton > of > > >>> work has gone into speeding up tests, so please don't read as a > > >> criticism! > > >>> > > >>> Thanks, > > >>> Matt > > >>> > > >> > > >
+
Ted Yu 2012-08-27, 21:16
-
Re: fastest way to run tests
Stack 2012-08-27, 21:17
On Mon, Aug 27, 2012 at 2:16 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > Thanks N for sharing the knowledge. > > Looks like some test needs to use random port(s): > > testStopDuringStart(org.apache.hadoop.hbase.master.TestMasterNoCluster): > Problem binding to sea-lab-0/10.249.196.101:60000 : Address already in use > testFailover(org.apache.hadoop.hbase.master.TestMasterNoCluster): Problem > binding to sea-lab-0/10.249.196.101:60000 : Address already in use > testCatalogDeploys(org.apache.hadoop.hbase.master.TestMasterNoCluster): > Problem binding to sea-lab-0/10.249.196.101:60000 : Address already in use > > Cheers >
File an issue Ted? St.Ack
+
Stack 2012-08-27, 21:17
-
Re: fastest way to run tests
Matt Corgan 2012-08-28, 00:23
Thanks - that helps a good bit. I do get some failures from things like "Filesystem closed" and "Address already in use", but sounds like that's not unexpected right now. On Mon, Aug 27, 2012 at 2:17 PM, Stack <[EMAIL PROTECTED]> wrote:
> On Mon, Aug 27, 2012 at 2:16 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > Thanks N for sharing the knowledge. > > > > Looks like some test needs to use random port(s): > > > > > testStopDuringStart(org.apache.hadoop.hbase.master.TestMasterNoCluster): > > Problem binding to sea-lab-0/10.249.196.101:60000 : Address already in > use > > testFailover(org.apache.hadoop.hbase.master.TestMasterNoCluster): > Problem > > binding to sea-lab-0/10.249.196.101:60000 : Address already in use > > testCatalogDeploys(org.apache.hadoop.hbase.master.TestMasterNoCluster): > > Problem binding to sea-lab-0/10.249.196.101:60000 : Address already in > use > > > > Cheers > > > > File an issue Ted? > St.Ack >
+
Matt Corgan 2012-08-28, 00:23
-
Re: fastest way to run tests
Jimmy Xiang 2012-08-28, 03:06
How can I verify that the thread count setting is used?
Will jps show that many surefire processes?
Thanks, Jimmy
On Mon, Aug 27, 2012 at 5:23 PM, Matt Corgan <[EMAIL PROTECTED]> wrote: > Thanks - that helps a good bit. I do get some failures from things like > "Filesystem closed" and "Address already in use", but sounds like that's > not unexpected right now. > > > On Mon, Aug 27, 2012 at 2:17 PM, Stack <[EMAIL PROTECTED]> wrote: > >> On Mon, Aug 27, 2012 at 2:16 PM, Ted Yu <[EMAIL PROTECTED]> wrote: >> > Thanks N for sharing the knowledge. >> > >> > Looks like some test needs to use random port(s): >> > >> > >> testStopDuringStart(org.apache.hadoop.hbase.master.TestMasterNoCluster): >> > Problem binding to sea-lab-0/10.249.196.101:60000 : Address already in >> use >> > testFailover(org.apache.hadoop.hbase.master.TestMasterNoCluster): >> Problem >> > binding to sea-lab-0/10.249.196.101:60000 : Address already in use >> > testCatalogDeploys(org.apache.hadoop.hbase.master.TestMasterNoCluster): >> > Problem binding to sea-lab-0/10.249.196.101:60000 : Address already in >> use >> > >> > Cheers >> > >> >> File an issue Ted? >> St.Ack >>
+
Jimmy Xiang 2012-08-28, 03:06
-
Re: fastest way to run tests
N Keywal 2012-08-28, 05:35
Yes, jps will show the processes when you start the medium & large tests (see 15.5.2. Unit Tests in hbase ref guide)
Hadoop-qa builds are executed with 4 processes. If locally you have "Address already in use" or ""Filesystem closed" it's a bug or a regression. We will need to increase the number of process from time to time to keep a reasonable test execution time while adding new tests.
Rules are (section 15 again):
- As much as possible, tests should be written as category small tests. - All tests must be written to support parallel execution on the same machine, hence they should not use shared resources as fixed ports or fixed file names. - Tests should not overlog. More than 100 lines/second makes the logs complex to read and use i/o that are hence not available for the other tests. - Tests can be written with HBaseTestingUtility. This class offers helper functions to create a temp directory and do the cleanup, or to start a cluster. Categories and execution time - All tests must be categorized, if not they could be skipped. - All tests should be written to be as fast as possible. - Small category tests should last less than 15 seconds, and must not have any side effect. - Medium category tests should last less than 50 seconds. - Large category tests should last less than 3 minutes. This should ensure a good parallelization for people using it, and ease the analysis when the test fails.
Cheers,
Nicolas
On Tue, Aug 28, 2012 at 5:06 AM, Jimmy Xiang <[EMAIL PROTECTED]> wrote:
> How can I verify that the thread count setting is used? > > Will jps show that many surefire processes? > > Thanks, > Jimmy > > On Mon, Aug 27, 2012 at 5:23 PM, Matt Corgan <[EMAIL PROTECTED]> wrote: > > Thanks - that helps a good bit. I do get some failures from things like > > "Filesystem closed" and "Address already in use", but sounds like that's > > not unexpected right now. > > > > > > On Mon, Aug 27, 2012 at 2:17 PM, Stack <[EMAIL PROTECTED]> wrote: > > > >> On Mon, Aug 27, 2012 at 2:16 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > >> > Thanks N for sharing the knowledge. > >> > > >> > Looks like some test needs to use random port(s): > >> > > >> > > >> testStopDuringStart(org.apache.hadoop.hbase.master.TestMasterNoCluster): > >> > Problem binding to sea-lab-0/10.249.196.101:60000 : Address already > in > >> use > >> > testFailover(org.apache.hadoop.hbase.master.TestMasterNoCluster): > >> Problem > >> > binding to sea-lab-0/10.249.196.101:60000 : Address already in use > >> > > testCatalogDeploys(org.apache.hadoop.hbase.master.TestMasterNoCluster): > >> > Problem binding to sea-lab-0/10.249.196.101:60000 : Address already > in > >> use > >> > > >> > Cheers > >> > > >> > >> File an issue Ted? > >> St.Ack > >> >
+
N Keywal 2012-08-28, 05:35
-
Re: fastest way to run tests
Jimmy Xiang 2012-08-30, 18:50
This is very helpful, thanks!
I played with it some time these days. The testing is much faster, which is great.
However, there is one issue. With parallel testing, very likely, some tests will be killed. As a result, the build could 1 either fail, 2 or succeed with a smaller total number of "Tests run"
I thought these Killed tests hanging. But they pass if tested not in parallel.
Is this a test case issue, or an issue with surefire?
Some of those red jenkins builds are because some tests are killed as findHangingTest.sh shows them hanging.
I was wondering before showing the build red, can we find those tests, then run them not in parallel one more time? This could be done in a jenkin build script, right?
Thanks, Jimmy On Mon, Aug 27, 2012 at 10:35 PM, N Keywal <[EMAIL PROTECTED]> wrote: > Yes, jps will show the processes when you start the medium & large tests > (see 15.5.2. Unit Tests in hbase ref guide) > > Hadoop-qa builds are executed with 4 processes. If locally you have > "Address already in use" or ""Filesystem closed" it's a bug or a regression. > We will need to increase the number of process from time to time to keep a > reasonable test execution time while adding new tests. > > Rules are (section 15 again): > > - As much as possible, tests should be written as category small tests. > - All tests must be written to support parallel execution on the same > machine, hence they should not use shared resources as fixed ports or fixed > file names. > - Tests should not overlog. More than 100 lines/second makes the logs > complex to read and use i/o that are hence not available for the other > tests. > - Tests can be written with HBaseTestingUtility. This class offers > helper functions to create a temp directory and do the cleanup, or to start > a cluster. Categories and execution time > - All tests must be categorized, if not they could be skipped. > - All tests should be written to be as fast as possible. > - Small category tests should last less than 15 seconds, and must not > have any side effect. > - Medium category tests should last less than 50 seconds. > - Large category tests should last less than 3 minutes. This should > ensure a good parallelization for people using it, and ease the analysis > when the test fails. > > Cheers, > > Nicolas > > > > On Tue, Aug 28, 2012 at 5:06 AM, Jimmy Xiang <[EMAIL PROTECTED]> wrote: > >> How can I verify that the thread count setting is used? >> >> Will jps show that many surefire processes? >> >> Thanks, >> Jimmy >> >> On Mon, Aug 27, 2012 at 5:23 PM, Matt Corgan <[EMAIL PROTECTED]> wrote: >> > Thanks - that helps a good bit. I do get some failures from things like >> > "Filesystem closed" and "Address already in use", but sounds like that's >> > not unexpected right now. >> > >> > >> > On Mon, Aug 27, 2012 at 2:17 PM, Stack <[EMAIL PROTECTED]> wrote: >> > >> >> On Mon, Aug 27, 2012 at 2:16 PM, Ted Yu <[EMAIL PROTECTED]> wrote: >> >> > Thanks N for sharing the knowledge. >> >> > >> >> > Looks like some test needs to use random port(s): >> >> > >> >> > >> >> testStopDuringStart(org.apache.hadoop.hbase.master.TestMasterNoCluster): >> >> > Problem binding to sea-lab-0/10.249.196.101:60000 : Address already >> in >> >> use >> >> > testFailover(org.apache.hadoop.hbase.master.TestMasterNoCluster): >> >> Problem >> >> > binding to sea-lab-0/10.249.196.101:60000 : Address already in use >> >> > >> testCatalogDeploys(org.apache.hadoop.hbase.master.TestMasterNoCluster): >> >> > Problem binding to sea-lab-0/10.249.196.101:60000 : Address already >> in >> >> use >> >> > >> >> > Cheers >> >> > >> >> >> >> File an issue Ted? >> >> St.Ack >> >> >>
+
Jimmy Xiang 2012-08-30, 18:50
-
Re: fastest way to run tests
N Keywal 2012-08-30, 21:12
Hi Jimmy,
On Thu, Aug 30, 2012 at 8:50 PM, Jimmy Xiang <[EMAIL PROTECTED]> wrote:
> Is this a test case issue, or an issue with surefire? >
There are issues with surefire, but in this case it's 90% us. Some of those red jenkins builds are because some tests are killed > as findHangingTest.sh shows them hanging. >
Yeah, surefire should have killed them, but didn't. There is a jira in surefire for this, it's not a totally trivial fix. > > I was wondering before showing the build red, can we find those tests, then > run them not in parallel one more time? This could be done in a jenkin > build script, right? > Yes it could (it used to exist actually, it's in dev-support). Or even in surefire, I think I've seen a jira for this. But I don't think it's a good direction for us:
- issues with parallelisation comes from fixed ports & so on. But on a dev machine, there are many reasons to have a port taken: you're running a local cluster, you have whatever software running who took it by accident, and so on. Tests should run on any reasonable environment. - Parallelization shows issue because it shakes the machine, but most of the time a test that fails under parallelization will fail if you try a few times. - Test flakiness can actually be HBase flakiness, see for example HBASE-5569. Or misunderstanding of important stuff as in HBASE-6175.
So I would personally recommend the hard way, i.e. fixing the flaky tests.
The situation is also much better now than a year ago. I year ago it was impossible for me to get a full run of tests without errors. Now it happens (sometimes).
Cheers,
N.
+
N Keywal 2012-08-30, 21:12
|
|