|
lars hofhansl
2012-12-25, 19:57
Jonathan Hsieh
2012-12-25, 20:11
yuzhihong@...
2012-12-25, 21:17
Andrew Purtell
2012-12-26, 01:29
Ted Yu
2012-12-26, 07:38
ramkrishna vasudevan
2012-12-26, 03:49
Andrew Purtell
2012-12-27, 06:02
Andrew Purtell
2012-12-28, 05:02
lars hofhansl
2012-12-27, 06:29
Stack
2012-12-26, 16:53
Stack
2012-12-26, 17:08
Stack
2012-12-26, 18:03
Enis Söztutar
2012-12-26, 20:02
Andrew Purtell
2012-12-27, 04:05
Jonathan Hsieh
2012-12-27, 19:49
Jonathan Hsieh
2012-12-27, 21:35
Andrew Purtell
2012-12-27, 21:03
lars hofhansl
2012-12-27, 06:37
Enis Söztutar
2012-12-27, 19:26
Ted Yu
2012-12-29, 00:28
lars hofhansl
2012-12-29, 00:34
Jesse Yates
2012-12-29, 19:45
Andrew Purtell
2012-12-31, 18:34
Andrew Purtell
2012-12-29, 00:33
Lars Hofhansl
2012-12-26, 16:47
|
-
0.94 tests back in shape and some guidelineslars hofhansl 2012-12-25, 19:57
During the past few days I spend some time to bring the 0.94 test back into shape.
GC issues, bad backports, hanging tests, memory issues, you name it. I do not want to ever have to do that again. The good news is: The 0.94 tests are back in shape now. Yeah! If you commit a patch it is your responsibility to make sure it passes the test suite. Either the tests should be fixed in a reasonable amount of time or the commit should be reverted. This is mainly for committers, contributors should also watch the test runs for their patches. No excuses. The tests are passing now. I do not care whether a test passes locally, or whether it fails rarely, or whether some tests failed previously, or whatever. Please, consider this a condition for me to continue as release manager for 0.94. (This is only for the 0.94 tests. I cannot speak for HadoopQA, or the regular trunk test suite, although eventually I assume we want similar guidelines there) I increased the retention time for past builds. I will find you :) I will publicly shame you. I will retroactively -1 the change and revert it, and then shame you again. :) Lastly, this is a function of the large amount of contributed patches. So it is a good problem to have. HBase it an actively maintained project and we certainly want to keep it this way, just with an acknoledgement that keeping the test suite passing is important. Thanks and Merry Christmas (to whoever celebrates that). -- Lars +
lars hofhansl 2012-12-25, 19:57
-
Re: 0.94 tests back in shape and some guidelinesJonathan Hsieh 2012-12-25, 20:11
+1. Your work is much appreciated!
On Tue, Dec 25, 2012 at 11:57 AM, lars hofhansl <[EMAIL PROTECTED]> wrote: > During the past few days I spend some time to bring the 0.94 test back into shape. > > GC issues, bad backports, hanging tests, memory issues, you name it. > I do not want to ever have to do that again. > > The good news is: The 0.94 tests are back in shape now. Yeah! > > If you commit a patch it is your responsibility to make sure it passes the test suite. > Either the tests should be fixed in a reasonable amount of time or the commit should be reverted. > This is mainly for committers, contributors should also watch the test runs for their patches. > No excuses. The tests are passing now. > I do not care whether a test passes locally, or whether it fails rarely, or whether some tests failed previously, or whatever. > > Please, consider this a condition for me to continue as release manager for 0.94. > (This is only for the 0.94 tests. I cannot speak for HadoopQA, or the regular trunk test suite, although eventually I assume we want similar guidelines there) > > I increased the retention time for past builds. I will find you :) > I will publicly shame you. I will retroactively -1 the change and revert it, and then shame you again. :) > > Lastly, this is a function of the large amount of contributed patches. So it is a good problem to have. > HBase it an actively maintained project and we certainly want to keep it this way, just with an acknoledgement that keeping the test suite passing is important. > > Thanks and Merry Christmas (to whoever celebrates that). > > -- Lars -- // Jonathan Hsieh (shay) // Software Engineer, Cloudera // [EMAIL PROTECTED] +
Jonathan Hsieh 2012-12-25, 20:11
-
Re: 0.94 tests back in shape and some guidelinesyuzhihong@... 2012-12-25, 21:17
I agree with what Lars said.
Test suite is our first line of defense. On Dec 25, 2012, at 11:57 AM, lars hofhansl <[EMAIL PROTECTED]> wrote: > During the past few days I spend some time to bring the 0.94 test back into shape. > > GC issues, bad backports, hanging tests, memory issues, you name it. > I do not want to ever have to do that again. > > The good news is: The 0.94 tests are back in shape now. Yeah! > > If you commit a patch it is your responsibility to make sure it passes the test suite. > Either the tests should be fixed in a reasonable amount of time or the commit should be reverted. > This is mainly for committers, contributors should also watch the test runs for their patches. > No excuses. The tests are passing now. > I do not care whether a test passes locally, or whether it fails rarely, or whether some tests failed previously, or whatever. > > Please, consider this a condition for me to continue as release manager for 0.94. > (This is only for the 0.94 tests. I cannot speak for HadoopQA, or the regular trunk test suite, although eventually I assume we want similar guidelines there) > > I increased the retention time for past builds. I will find you :) > I will publicly shame you. I will retroactively -1 the change and revert it, and then shame you again. :) > > Lastly, this is a function of the large amount of contributed patches. So it is a good problem to have. > HBase it an actively maintained project and we certainly want to keep it this way, just with an acknoledgement that keeping the test suite passing is important. > > Thanks and Merry Christmas (to whoever celebrates that). > > -- Lars +
yuzhihong@... 2012-12-25, 21:17
-
Re: 0.94 tests back in shape and some guidelinesAndrew Purtell 2012-12-26, 01:29
Thank you so much for doing this Lars.
> I will retroactively -1 the change and revert it, and then shame you again. :) +1 On Tuesday, December 25, 2012, lars hofhansl wrote: > During the past few days I spend some time to bring the 0.94 test back > into shape. > > GC issues, bad backports, hanging tests, memory issues, you name it. > I do not want to ever have to do that again. > > The good news is: The 0.94 tests are back in shape now. Yeah! > > If you commit a patch it is your responsibility to make sure it passes the > test suite. > Either the tests should be fixed in a reasonable amount of time or the > commit should be reverted. > This is mainly for committers, contributors should also watch the test > runs for their patches. > No excuses. The tests are passing now. > I do not care whether a test passes locally, or whether it fails rarely, > or whether some tests failed previously, or whatever. > > Please, consider this a condition for me to continue as release manager > for 0.94. > (This is only for the 0.94 tests. I cannot speak for HadoopQA, or the > regular trunk test suite, although eventually I assume we want similar > guidelines there) > > I increased the retention time for past builds. I will find you :) > I will publicly shame you. I will retroactively -1 the change and revert > it, and then shame you again. :) > > Lastly, this is a function of the large amount of contributed patches. So > it is a good problem to have. > HBase it an actively maintained project and we certainly want to keep it > this way, just with an acknoledgement that keeping the test suite passing > is important. > > Thanks and Merry Christmas (to whoever celebrates that). > > -- Lars -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) +
Andrew Purtell 2012-12-26, 01:29
-
Re: 0.94 tests back in shape and some guidelinesTed Yu 2012-12-26, 07:38
I think we should come up with some backup plan in case Jenkins comes to a
halt due to IO exceptions. The following was one example: https://builds.apache.org/job/PreCommit-HBASE-Build/3696/console On Tue, Dec 25, 2012 at 5:29 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > Thank you so much for doing this Lars. > > > I will retroactively -1 the change and revert it, and then shame you > again. :) > > +1 > > On Tuesday, December 25, 2012, lars hofhansl wrote: > > > During the past few days I spend some time to bring the 0.94 test back > > into shape. > > > > GC issues, bad backports, hanging tests, memory issues, you name it. > > I do not want to ever have to do that again. > > > > The good news is: The 0.94 tests are back in shape now. Yeah! > > > > If you commit a patch it is your responsibility to make sure it passes > the > > test suite. > > Either the tests should be fixed in a reasonable amount of time or the > > commit should be reverted. > > This is mainly for committers, contributors should also watch the test > > runs for their patches. > > No excuses. The tests are passing now. > > I do not care whether a test passes locally, or whether it fails rarely, > > or whether some tests failed previously, or whatever. > > > > Please, consider this a condition for me to continue as release manager > > for 0.94. > > (This is only for the 0.94 tests. I cannot speak for HadoopQA, or the > > regular trunk test suite, although eventually I assume we want similar > > guidelines there) > > > > I increased the retention time for past builds. I will find you :) > > I will publicly shame you. I will retroactively -1 the change and revert > > it, and then shame you again. :) > > > > Lastly, this is a function of the large amount of contributed patches. So > > it is a good problem to have. > > HBase it an actively maintained project and we certainly want to keep it > > this way, just with an acknoledgement that keeping the test suite passing > > is important. > > > > Thanks and Merry Christmas (to whoever celebrates that). > > > > -- Lars > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > +
Ted Yu 2012-12-26, 07:38
-
Re: 0.94 tests back in shape and some guidelinesramkrishna vasudevan 2012-12-26, 03:49
Thanks for your efforts Lars.
+1 on what you said. Regards Ram On Wed, Dec 26, 2012 at 6:59 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > Thank you so much for doing this Lars. > > > I will retroactively -1 the change and revert it, and then shame you > again. :) > > +1 > > On Tuesday, December 25, 2012, lars hofhansl wrote: > > > During the past few days I spend some time to bring the 0.94 test back > > into shape. > > > > GC issues, bad backports, hanging tests, memory issues, you name it. > > I do not want to ever have to do that again. > > > > The good news is: The 0.94 tests are back in shape now. Yeah! > > > > If you commit a patch it is your responsibility to make sure it passes > the > > test suite. > > Either the tests should be fixed in a reasonable amount of time or the > > commit should be reverted. > > This is mainly for committers, contributors should also watch the test > > runs for their patches. > > No excuses. The tests are passing now. > > I do not care whether a test passes locally, or whether it fails rarely, > > or whether some tests failed previously, or whatever. > > > > Please, consider this a condition for me to continue as release manager > > for 0.94. > > (This is only for the 0.94 tests. I cannot speak for HadoopQA, or the > > regular trunk test suite, although eventually I assume we want similar > > guidelines there) > > > > I increased the retention time for past builds. I will find you :) > > I will publicly shame you. I will retroactively -1 the change and revert > > it, and then shame you again. :) > > > > Lastly, this is a function of the large amount of contributed patches. So > > it is a good problem to have. > > HBase it an actively maintained project and we certainly want to keep it > > this way, just with an acknoledgement that keeping the test suite passing > > is important. > > > > Thanks and Merry Christmas (to whoever celebrates that). > > > > -- Lars > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > +
ramkrishna vasudevan 2012-12-26, 03:49
-
Re: 0.94 tests back in shape and some guidelinesAndrew Purtell 2012-12-27, 06:02
Interesting that the runs for 0.94 are good again up on Jenkins but running
them locally (@ r1426067) I get: > $ mvn clean test -PrunAllTests -DskipITs -Djava.net.preferIPv4Stack=true [...] Results : Failed tests: testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart): expected:<22> but was:<23> testAcquireTaskAtStartup(org.apache.hadoop.hbase.regionserver.TestSplitLogWorker): ctr=0, oldval=0, newval=1 Tests in error: testDelayedDeleteOnFailure(org.apache.hadoop.hbase.master.TestDistributedLogSplitting): test timed out after 25000 milliseconds test3686a(org.apache.hadoop.hbase.client.TestScannerTimeout): 11759ms passed since the last invocation, timeout is currently set to 10000 queueFailover(org.apache.hadoop.hbase.replication.TestReplication): test timed out after 300000 milliseconds queueFailover(org.apache.hadoop.hbase.replication.TestReplicationWithCompression): test timed out after 300000 milliseconds testRunThriftServer[9](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine): test timed out after 60000 milliseconds testShutdownFixupWhenDaughterHasSplit(org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster) Tests run: 1220, Failures: 2, Errors: 6, Skipped: 13 Will start looking at some of these tomorrow. On Tue, Dec 25, 2012 at 5:29 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > Thank you so much for doing this Lars. > > > I will retroactively -1 the change and revert it, and then shame you > again. :) > > +1 > > On Tuesday, December 25, 2012, lars hofhansl wrote: > >> During the past few days I spend some time to bring the 0.94 test back >> into shape. >> >> GC issues, bad backports, hanging tests, memory issues, you name it. >> I do not want to ever have to do that again. >> >> The good news is: The 0.94 tests are back in shape now. Yeah! >> >> If you commit a patch it is your responsibility to make sure it passes >> the test suite. >> Either the tests should be fixed in a reasonable amount of time or the >> commit should be reverted. >> This is mainly for committers, contributors should also watch the test >> runs for their patches. >> No excuses. The tests are passing now. >> I do not care whether a test passes locally, or whether it fails rarely, >> or whether some tests failed previously, or whatever. >> >> Please, consider this a condition for me to continue as release manager >> for 0.94. >> (This is only for the 0.94 tests. I cannot speak for HadoopQA, or the >> regular trunk test suite, although eventually I assume we want similar >> guidelines there) >> >> I increased the retention time for past builds. I will find you :) >> I will publicly shame you. I will retroactively -1 the change and revert >> it, and then shame you again. :) >> >> Lastly, this is a function of the large amount of contributed patches. So >> it is a good problem to have. >> HBase it an actively maintained project and we certainly want to keep it >> this way, just with an acknoledgement that keeping the test suite passing >> is important. >> >> Thanks and Merry Christmas (to whoever celebrates that). >> >> -- Lars > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) +
Andrew Purtell 2012-12-27, 06:02
-
Re: 0.94 tests back in shape and some guidelinesAndrew Purtell 2012-12-28, 05:02
Bisecting the first one, with 50 repetitions, I get
https://issues.apache.org/jira/browse/HBASE-7447 On Wed, Dec 26, 2012 at 10:02 PM, Andrew Purtell <[EMAIL PROTECTED]>wrote: > Interesting that the runs for 0.94 are good again up on Jenkins but > running them locally (@ r1426067) I get: > > > $ mvn clean test -PrunAllTests -DskipITs -Djava.net.preferIPv4Stack=true > [...] > > Results : > Failed tests: > testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart): > expected:<22> but was:<23> > > testAcquireTaskAtStartup(org.apache.hadoop.hbase.regionserver.TestSplitLogWorker): > ctr=0, oldval=0, newval=1 > > Tests in error: > > testDelayedDeleteOnFailure(org.apache.hadoop.hbase.master.TestDistributedLogSplitting): > test timed out after 25000 milliseconds > test3686a(org.apache.hadoop.hbase.client.TestScannerTimeout): 11759ms > passed since the last invocation, timeout is currently set to 10000 > queueFailover(org.apache.hadoop.hbase.replication.TestReplication): test > timed out after 300000 milliseconds > > queueFailover(org.apache.hadoop.hbase.replication.TestReplicationWithCompression): > test timed out after 300000 milliseconds > > testRunThriftServer[9](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine): > test timed out after 60000 milliseconds > > testShutdownFixupWhenDaughterHasSplit(org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster) > > Tests run: 1220, Failures: 2, Errors: 6, Skipped: 13 > > Will start looking at some of these tomorrow. > > On Tue, Dec 25, 2012 at 5:29 PM, Andrew Purtell <[EMAIL PROTECTED]>wrote: > >> Thank you so much for doing this Lars. >> >> > I will retroactively -1 the change and revert it, and then shame you >> again. :) >> >> +1 >> >> On Tuesday, December 25, 2012, lars hofhansl wrote: >> >>> During the past few days I spend some time to bring the 0.94 test back >>> into shape. >>> >>> GC issues, bad backports, hanging tests, memory issues, you name it. >>> I do not want to ever have to do that again. >>> >>> The good news is: The 0.94 tests are back in shape now. Yeah! >>> >>> If you commit a patch it is your responsibility to make sure it passes >>> the test suite. >>> Either the tests should be fixed in a reasonable amount of time or the >>> commit should be reverted. >>> This is mainly for committers, contributors should also watch the test >>> runs for their patches. >>> No excuses. The tests are passing now. >>> I do not care whether a test passes locally, or whether it fails rarely, >>> or whether some tests failed previously, or whatever. >>> >>> Please, consider this a condition for me to continue as release manager >>> for 0.94. >>> (This is only for the 0.94 tests. I cannot speak for HadoopQA, or the >>> regular trunk test suite, although eventually I assume we want similar >>> guidelines there) >>> >>> I increased the retention time for past builds. I will find you :) >>> I will publicly shame you. I will retroactively -1 the change and revert >>> it, and then shame you again. :) >>> >>> Lastly, this is a function of the large amount of contributed patches. >>> So it is a good problem to have. >>> HBase it an actively maintained project and we certainly want to keep it >>> this way, just with an acknoledgement that keeping the test suite passing >>> is important. >>> >>> Thanks and Merry Christmas (to whoever celebrates that). >>> >>> -- Lars >> >> >> >> -- >> Best regards, >> >> - Andy >> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White) >> >> > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) +
Andrew Purtell 2012-12-28, 05:02
-
Re: 0.94 tests back in shape and some guidelineslars hofhansl 2012-12-27, 06:29
I ran those yesterday (or the day before) and despite the hanging tests - since fixed - they all passed for me. hmm...
-- Lars ________________________________ From: Andrew Purtell <[EMAIL PROTECTED]> To: Andrew Purtell <[EMAIL PROTECTED]> Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]> Sent: Wednesday, December 26, 2012 10:02 PM Subject: Re: 0.94 tests back in shape and some guidelines Interesting that the runs for 0.94 are good again up on Jenkins but running them locally (@ r1426067) I get: > $ mvn clean test -PrunAllTests -DskipITs -Djava.net.preferIPv4Stack=true [...] Results : Failed tests: testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart): expected:<22> but was:<23> testAcquireTaskAtStartup(org.apache.hadoop.hbase.regionserver.TestSplitLogWorker): ctr=0, oldval=0, newval=1 Tests in error: testDelayedDeleteOnFailure(org.apache.hadoop.hbase.master.TestDistributedLogSplitting): test timed out after 25000 milliseconds test3686a(org.apache.hadoop.hbase.client.TestScannerTimeout): 11759ms passed since the last invocation, timeout is currently set to 10000 queueFailover(org.apache.hadoop.hbase.replication.TestReplication): test timed out after 300000 milliseconds queueFailover(org.apache.hadoop.hbase.replication.TestReplicationWithCompression): test timed out after 300000 milliseconds testRunThriftServer[9](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine): test timed out after 60000 milliseconds testShutdownFixupWhenDaughterHasSplit(org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster) Tests run: 1220, Failures: 2, Errors: 6, Skipped: 13 Will start looking at some of these tomorrow. On Tue, Dec 25, 2012 at 5:29 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > Thank you so much for doing this Lars. > > > I will retroactively -1 the change and revert it, and then shame you > again. :) > > +1 > > On Tuesday, December 25, 2012, lars hofhansl wrote: > >> During the past few days I spend some time to bring the 0.94 test back >> into shape. >> >> GC issues, bad backports, hanging tests, memory issues, you name it. >> I do not want to ever have to do that again. >> >> The good news is: The 0.94 tests are back in shape now. Yeah! >> >> If you commit a patch it is your responsibility to make sure it passes >> the test suite. >> Either the tests should be fixed in a reasonable amount of time or the >> commit should be reverted. >> This is mainly for committers, contributors should also watch the test >> runs for their patches. >> No excuses. The tests are passing now. >> I do not care whether a test passes locally, or whether it fails rarely, >> or whether some tests failed previously, or whatever. >> >> Please, consider this a condition for me to continue as release manager >> for 0.94. >> (This is only for the 0.94 tests. I cannot speak for HadoopQA, or the >> regular trunk test suite, although eventually I assume we want similar >> guidelines there) >> >> I increased the retention time for past builds. I will find you :) >> I will publicly shame you. I will retroactively -1 the change and revert >> it, and then shame you again. :) >> >> Lastly, this is a function of the large amount of contributed patches. So >> it is a good problem to have. >> HBase it an actively maintained project and we certainly want to keep it >> this way, just with an acknoledgement that keeping the test suite passing >> is important. >> >> Thanks and Merry Christmas (to whoever celebrates that). >> >> -- Lars > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) +
lars hofhansl 2012-12-27, 06:29
-
Re: 0.94 tests back in shape and some guidelinesStack 2012-12-26, 16:53
Thanks for doing the fixup "Iron Hand". +1 on these rules for a branch or
for any branch (We'll have to do the same for for trunk when it becomes 0.96 branch). Should we add something here: http://hbase.apache.org/book.html#hbase.tests Or to the community section: http://hbase.apache.org/book.html#community ? Or to the developer section? St.Ack On Tue, Dec 25, 2012 at 11:57 AM, lars hofhansl <[EMAIL PROTECTED]> wrote: > During the past few days I spend some time to bring the 0.94 test back > into shape. > > GC issues, bad backports, hanging tests, memory issues, you name it. > I do not want to ever have to do that again. > > The good news is: The 0.94 tests are back in shape now. Yeah! > > If you commit a patch it is your responsibility to make sure it passes the > test suite. > Either the tests should be fixed in a reasonable amount of time or the > commit should be reverted. > This is mainly for committers, contributors should also watch the test > runs for their patches. > No excuses. The tests are passing now. > I do not care whether a test passes locally, or whether it fails rarely, > or whether some tests failed previously, or whatever. > > Please, consider this a condition for me to continue as release manager > for 0.94. > (This is only for the 0.94 tests. I cannot speak for HadoopQA, or the > regular trunk test suite, although eventually I assume we want similar > guidelines there) > > I increased the retention time for past builds. I will find you :) > I will publicly shame you. I will retroactively -1 the change and revert > it, and then shame you again. :) > > Lastly, this is a function of the large amount of contributed patches. So > it is a good problem to have. > HBase it an actively maintained project and we certainly want to keep it > this way, just with an acknoledgement that keeping the test suite passing > is important. > > Thanks and Merry Christmas (to whoever celebrates that). > > -- Lars +
Stack 2012-12-26, 16:53
-
Re: 0.94 tests back in shape and some guidelinesStack 2012-12-26, 17:08
Or there is a submitting patches section:
http://hbase.apache.org/book.html#submitting.patches St.Ack On Wed, Dec 26, 2012 at 8:53 AM, Stack <[EMAIL PROTECTED]> wrote: > Thanks for doing the fixup "Iron Hand". +1 on these rules for a branch or > for any branch (We'll have to do the same for for trunk when it becomes > 0.96 branch). Should we add something here: > http://hbase.apache.org/book.html#hbase.tests Or to the community > section: http://hbase.apache.org/book.html#community ? Or to the > developer section? > > St.Ack > > > On Tue, Dec 25, 2012 at 11:57 AM, lars hofhansl <[EMAIL PROTECTED]>wrote: > >> During the past few days I spend some time to bring the 0.94 test back >> into shape. >> >> GC issues, bad backports, hanging tests, memory issues, you name it. >> I do not want to ever have to do that again. >> >> The good news is: The 0.94 tests are back in shape now. Yeah! >> >> If you commit a patch it is your responsibility to make sure it passes >> the test suite. >> Either the tests should be fixed in a reasonable amount of time or the >> commit should be reverted. >> This is mainly for committers, contributors should also watch the test >> runs for their patches. >> No excuses. The tests are passing now. >> I do not care whether a test passes locally, or whether it fails rarely, >> or whether some tests failed previously, or whatever. >> >> Please, consider this a condition for me to continue as release manager >> for 0.94. >> (This is only for the 0.94 tests. I cannot speak for HadoopQA, or the >> regular trunk test suite, although eventually I assume we want similar >> guidelines there) >> >> I increased the retention time for past builds. I will find you :) >> I will publicly shame you. I will retroactively -1 the change and revert >> it, and then shame you again. :) >> >> Lastly, this is a function of the large amount of contributed patches. So >> it is a good problem to have. >> HBase it an actively maintained project and we certainly want to keep it >> this way, just with an acknoledgement that keeping the test suite passing >> is important. >> >> Thanks and Merry Christmas (to whoever celebrates that). >> >> -- Lars > > > +
Stack 2012-12-26, 17:08
-
Re: 0.94 tests back in shape and some guidelinesStack 2012-12-26, 18:03
I just added a section to the 'contributing' section on committers being
responsible for ensuring contributor's patches do not break build or tests. St.Ack On Wed, Dec 26, 2012 at 9:08 AM, Stack <[EMAIL PROTECTED]> wrote: > Or there is a submitting patches section: > http://hbase.apache.org/book.html#submitting.patches > St.Ack > > > On Wed, Dec 26, 2012 at 8:53 AM, Stack <[EMAIL PROTECTED]> wrote: > >> Thanks for doing the fixup "Iron Hand". +1 on these rules for a branch >> or for any branch (We'll have to do the same for for trunk when it becomes >> 0.96 branch). Should we add something here: >> http://hbase.apache.org/book.html#hbase.tests Or to the community >> section: http://hbase.apache.org/book.html#community ? Or to the >> developer section? >> >> St.Ack >> >> >> On Tue, Dec 25, 2012 at 11:57 AM, lars hofhansl <[EMAIL PROTECTED]>wrote: >> >>> During the past few days I spend some time to bring the 0.94 test back >>> into shape. >>> >>> GC issues, bad backports, hanging tests, memory issues, you name it. >>> I do not want to ever have to do that again. >>> >>> The good news is: The 0.94 tests are back in shape now. Yeah! >>> >>> If you commit a patch it is your responsibility to make sure it passes >>> the test suite. >>> Either the tests should be fixed in a reasonable amount of time or the >>> commit should be reverted. >>> This is mainly for committers, contributors should also watch the test >>> runs for their patches. >>> No excuses. The tests are passing now. >>> I do not care whether a test passes locally, or whether it fails rarely, >>> or whether some tests failed previously, or whatever. >>> >>> Please, consider this a condition for me to continue as release manager >>> for 0.94. >>> (This is only for the 0.94 tests. I cannot speak for HadoopQA, or the >>> regular trunk test suite, although eventually I assume we want similar >>> guidelines there) >>> >>> I increased the retention time for past builds. I will find you :) >>> I will publicly shame you. I will retroactively -1 the change and revert >>> it, and then shame you again. :) >>> >>> Lastly, this is a function of the large amount of contributed patches. >>> So it is a good problem to have. >>> HBase it an actively maintained project and we certainly want to keep it >>> this way, just with an acknoledgement that keeping the test suite passing >>> is important. >>> >>> Thanks and Merry Christmas (to whoever celebrates that). >>> >>> -- Lars >> >> >> > +
Stack 2012-12-26, 18:03
-
Re: 0.94 tests back in shape and some guidelinesEnis Söztutar 2012-12-26, 20:02
Just a reference of some of the recent efforts that went in:
HBASE-7432 TestHBaseFsck prevents testsuite from finishing HBASE-7431 TestSplitTransactionOnCluster tests still flaky HBASE-7417 Test patch, hopefully fixes TestReplication HBASE-7421 TestHFileCleaner->testHFileCleaning has an aggressive timeout HBASE-7398 [0.94 UNIT TESTS] TestAssignmentManager fails frequently on CentOS 5 HBASE-7338 Fix flaky condition for org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange HBASE-6175 TestFSUtils flaky on hdfs getFileStatus method HBASE-7343 Fix flaky condition for TestDrainingServer (Himanshu) HBASE-7301 Force ipv4 for unit tests HBASE-7300 HbckTestingUtil needs to keep a static executor to lower the number of threads used HBASE-6206 Large tests fail with jdk1.7 HBASE-7252 TestSizeBasedThrottler fails occasionally HBASE-7235 TestMasterObserver is flaky HBASE-7172 TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky HBASE-7177 TestZooKeeperScanPolicyObserver.testScanPolicyObserver is flaky HBASE-7166 TestSplitTransactionOnCluster tests are flaky HBASE-7165 TestSplitLogManager.testUnassignedTimeout is flaky HBASE-5984 TestLogRolling.testLogRollOnPipelineRestart failed with HADOOP 2.0.0 HBASE-7142 TestSplitLogManager#testDeadWorker may fail because of hard limit on the TimeoutMonitor's timeout period (Himanshu) HBASE-7143 TestMetaMigrationRemovingHTD fails when used with Hadoop 0.23/2.x (Andrey Klochlov) HBASE-6958 TestAssignmentManager sometimes fails HBASE-6305 TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds. (Himanshu) HBASE-6796 ADDENDUM, remove spurious time limit from testHFileCleaning HBASE-6852, REVERT again, due to unexplained test failures that only occur on the jenkins machines HBASE-7077 ADDENDUM, add TestCategory HBASE-6733 TestReplication.queueFailover occasionally fails [Part-2] HBASE-6906 TestHBaseFsck#testQuarantine* tests are flakey due to TestNotEnabledException HBASE-6784 TestCoprocessorScanPolicy is sometimes flaky when run locally HBASE-6714 TestMultiSlaveReplication#testMultiSlaveReplication may fail HBASE-6715 TestFromClientSide.testCacheOnWriteEvictOnClose is flaky Please keep these in mind, when you are writing a new test. Enis On Wed, Dec 26, 2012 at 10:03 AM, Stack <[EMAIL PROTECTED]> wrote: > I just added a section to the 'contributing' section on committers being > responsible for ensuring contributor's patches do not break build or tests. > St.Ack > > > On Wed, Dec 26, 2012 at 9:08 AM, Stack <[EMAIL PROTECTED]> wrote: > > > Or there is a submitting patches section: > > http://hbase.apache.org/book.html#submitting.patches > > St.Ack > > > > > > On Wed, Dec 26, 2012 at 8:53 AM, Stack <[EMAIL PROTECTED]> wrote: > > > >> Thanks for doing the fixup "Iron Hand". +1 on these rules for a branch > >> or for any branch (We'll have to do the same for for trunk when it > becomes > >> 0.96 branch). Should we add something here: > >> http://hbase.apache.org/book.html#hbase.tests Or to the community > >> section: http://hbase.apache.org/book.html#community ? Or to the > >> developer section? > >> > >> St.Ack > >> > >> > >> On Tue, Dec 25, 2012 at 11:57 AM, lars hofhansl <[EMAIL PROTECTED] > >wrote: > >> > >>> During the past few days I spend some time to bring the 0.94 test back > >>> into shape. > >>> > >>> GC issues, bad backports, hanging tests, memory issues, you name it. > >>> I do not want to ever have to do that again. > >>> > >>> The good news is: The 0.94 tests are back in shape now. Yeah! > >>> > >>> If you commit a patch it is your responsibility to make sure it passes > >>> the test suite. > >>> Either the tests should be fixed in a reasonable amount of time or the > >>> commit should be reverted. > >>> This is mainly for committers, contributors should also watch the test > >>> runs for their patches. > > +
Enis Söztutar 2012-12-26, 20:02
-
Re: 0.94 tests back in shape and some guidelinesAndrew Purtell 2012-12-27, 04:05
Hmm... How about just adding to the contributor section that new tests
should run reliably N times locally. N=10? N=20? N=100? On Wed, Dec 26, 2012 at 12:02 PM, Enis Söztutar <[EMAIL PROTECTED]> wrote: > Just a reference of some of the recent efforts that went in: > HBASE-7432 TestHBaseFsck prevents testsuite from finishing > HBASE-7431 TestSplitTransactionOnCluster tests still flaky > HBASE-7417 Test patch, hopefully fixes TestReplication > HBASE-7421 TestHFileCleaner->testHFileCleaning has an aggressive > timeout > HBASE-7398 [0.94 UNIT TESTS] TestAssignmentManager fails frequently on > CentOS 5 > HBASE-7338 Fix flaky condition for > > org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange > HBASE-6175 TestFSUtils flaky on hdfs getFileStatus method > HBASE-7343 Fix flaky condition for TestDrainingServer (Himanshu) > HBASE-7301 Force ipv4 for unit tests > HBASE-7300 HbckTestingUtil needs to keep a static executor to lower > the number of threads used > HBASE-6206 Large tests fail with jdk1.7 > HBASE-7252 TestSizeBasedThrottler fails occasionally > HBASE-7235 TestMasterObserver is flaky > HBASE-7172 TestSplitLogManager.testVanishingTaskZNode() fails when run > individually and is flaky > HBASE-7177 TestZooKeeperScanPolicyObserver.testScanPolicyObserver is > flaky > HBASE-7166 TestSplitTransactionOnCluster tests are flaky > HBASE-7165 TestSplitLogManager.testUnassignedTimeout is flaky > HBASE-5984 TestLogRolling.testLogRollOnPipelineRestart failed with > HADOOP 2.0.0 > HBASE-7142 TestSplitLogManager#testDeadWorker may fail because of hard > limit on the TimeoutMonitor's timeout period (Himanshu) > HBASE-7143 TestMetaMigrationRemovingHTD fails when used with Hadoop > 0.23/2.x (Andrey Klochlov) > HBASE-6958 TestAssignmentManager sometimes fails > HBASE-6305 TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds. > (Himanshu) > HBASE-6796 ADDENDUM, remove spurious time limit from testHFileCleaning > HBASE-6852, REVERT again, due to unexplained test failures that only > occur on the jenkins machines > HBASE-7077 ADDENDUM, add TestCategory > HBASE-6733 TestReplication.queueFailover occasionally fails [Part-2] > HBASE-6906 TestHBaseFsck#testQuarantine* tests are flakey due to > TestNotEnabledException > HBASE-6784 TestCoprocessorScanPolicy is sometimes flaky when run > locally > HBASE-6714 TestMultiSlaveReplication#testMultiSlaveReplication may fail > HBASE-6715 TestFromClientSide.testCacheOnWriteEvictOnClose is flaky > > > Please keep these in mind, when you are writing a new test. > Enis > > > On Wed, Dec 26, 2012 at 10:03 AM, Stack <[EMAIL PROTECTED]> wrote: > > > I just added a section to the 'contributing' section on committers being > > responsible for ensuring contributor's patches do not break build or > tests. > > St.Ack > > > > > > On Wed, Dec 26, 2012 at 9:08 AM, Stack <[EMAIL PROTECTED]> wrote: > > > > > Or there is a submitting patches section: > > > http://hbase.apache.org/book.html#submitting.patches > > > St.Ack > > > > > > > > > On Wed, Dec 26, 2012 at 8:53 AM, Stack <[EMAIL PROTECTED]> wrote: > > > > > >> Thanks for doing the fixup "Iron Hand". +1 on these rules for a > branch > > >> or for any branch (We'll have to do the same for for trunk when it > > becomes > > >> 0.96 branch). Should we add something here: > > >> http://hbase.apache.org/book.html#hbase.tests Or to the community > > >> section: http://hbase.apache.org/book.html#community ? Or to the > > >> developer section? > > >> > > >> St.Ack > > >> > > >> > > >> On Tue, Dec 25, 2012 at 11:57 AM, lars hofhansl <[EMAIL PROTECTED] > > >wrote: > > >> > > >>> During the past few days I spend some time to bring the 0.94 test > back > > >>> into shape. > > >>> > > >>> GC issues, bad backports, hanging tests, memory issues, you name it. > > >>> I do not want to ever have to do that again. > > >>> > > >>> The good news is: The 0.94 tests are back in shape now. Yeah! Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) +
Andrew Purtell 2012-12-27, 04:05
-
Re: 0.94 tests back in shape and some guidelinesJonathan Hsieh 2012-12-27, 19:49
Similar to Enis's comments -- it is not just the new tests running N
times -- it is that sometimes they leave junk behind that pollutes other tests. I'm currently having similar problems on the trunk+snapshot branch. I've also been running trunk+snapshot and there are some tests that seem test broken there. (I had started on some of them a while back, probably time to get back to them). Jon. On Wed, Dec 26, 2012 at 8:05 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > Hmm... How about just adding to the contributor section that new tests > should run reliably N times locally. N=10? N=20? N=100? > > > On Wed, Dec 26, 2012 at 12:02 PM, Enis Söztutar <[EMAIL PROTECTED]> wrote: > >> Just a reference of some of the recent efforts that went in: >> HBASE-7432 TestHBaseFsck prevents testsuite from finishing >> HBASE-7431 TestSplitTransactionOnCluster tests still flaky >> HBASE-7417 Test patch, hopefully fixes TestReplication >> HBASE-7421 TestHFileCleaner->testHFileCleaning has an aggressive >> timeout >> HBASE-7398 [0.94 UNIT TESTS] TestAssignmentManager fails frequently on >> CentOS 5 >> HBASE-7338 Fix flaky condition for >> >> org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange >> HBASE-6175 TestFSUtils flaky on hdfs getFileStatus method >> HBASE-7343 Fix flaky condition for TestDrainingServer (Himanshu) >> HBASE-7301 Force ipv4 for unit tests >> HBASE-7300 HbckTestingUtil needs to keep a static executor to lower >> the number of threads used >> HBASE-6206 Large tests fail with jdk1.7 >> HBASE-7252 TestSizeBasedThrottler fails occasionally >> HBASE-7235 TestMasterObserver is flaky >> HBASE-7172 TestSplitLogManager.testVanishingTaskZNode() fails when run >> individually and is flaky >> HBASE-7177 TestZooKeeperScanPolicyObserver.testScanPolicyObserver is >> flaky >> HBASE-7166 TestSplitTransactionOnCluster tests are flaky >> HBASE-7165 TestSplitLogManager.testUnassignedTimeout is flaky >> HBASE-5984 TestLogRolling.testLogRollOnPipelineRestart failed with >> HADOOP 2.0.0 >> HBASE-7142 TestSplitLogManager#testDeadWorker may fail because of hard >> limit on the TimeoutMonitor's timeout period (Himanshu) >> HBASE-7143 TestMetaMigrationRemovingHTD fails when used with Hadoop >> 0.23/2.x (Andrey Klochlov) >> HBASE-6958 TestAssignmentManager sometimes fails >> HBASE-6305 TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds. >> (Himanshu) >> HBASE-6796 ADDENDUM, remove spurious time limit from testHFileCleaning >> HBASE-6852, REVERT again, due to unexplained test failures that only >> occur on the jenkins machines >> HBASE-7077 ADDENDUM, add TestCategory >> HBASE-6733 TestReplication.queueFailover occasionally fails [Part-2] >> HBASE-6906 TestHBaseFsck#testQuarantine* tests are flakey due to >> TestNotEnabledException >> HBASE-6784 TestCoprocessorScanPolicy is sometimes flaky when run >> locally >> HBASE-6714 TestMultiSlaveReplication#testMultiSlaveReplication may fail >> HBASE-6715 TestFromClientSide.testCacheOnWriteEvictOnClose is flaky >> >> >> Please keep these in mind, when you are writing a new test. >> Enis >> >> >> On Wed, Dec 26, 2012 at 10:03 AM, Stack <[EMAIL PROTECTED]> wrote: >> >> > I just added a section to the 'contributing' section on committers being >> > responsible for ensuring contributor's patches do not break build or >> tests. >> > St.Ack >> > >> > >> > On Wed, Dec 26, 2012 at 9:08 AM, Stack <[EMAIL PROTECTED]> wrote: >> > >> > > Or there is a submitting patches section: >> > > http://hbase.apache.org/book.html#submitting.patches >> > > St.Ack >> > > >> > > >> > > On Wed, Dec 26, 2012 at 8:53 AM, Stack <[EMAIL PROTECTED]> wrote: >> > > >> > >> Thanks for doing the fixup "Iron Hand". +1 on these rules for a >> branch >> > >> or for any branch (We'll have to do the same for for trunk when it >> > becomes >> > >> 0.96 branch). Should we add something here: >> > >> http://hbase.apache.org/book.html#hbase.tests Or to the community // Jonathan Hsieh (shay) // Software Engineer, Cloudera // [EMAIL PROTECTED] +
Jonathan Hsieh 2012-12-27, 19:49
-
Re: 0.94 tests back in shape and some guidelinesJonathan Hsieh 2012-12-27, 21:35
On Thu, Dec 27, 2012 at 11:49 AM, Jonathan Hsieh <[EMAIL PROTECTED]> wrote:
> > I've also been running trunk+snapshot and there are some tests that > seem test broken there. (I had started on some of them a while back, > probably time to get back to them). > > Jon. (omitted an important point) -- trunk and trunk+snapshot over hadoop 2 seems to have the same set of broken/flapping tests. -- // Jonathan Hsieh (shay) // Software Engineer, Cloudera // [EMAIL PROTECTED] +
Jonathan Hsieh 2012-12-27, 21:35
-
Re: 0.94 tests back in shape and some guidelinesAndrew Purtell 2012-12-27, 21:03
Sure, the "N times" thing was simply about suggesting a simple filter on
new tests, so contributors don't put up a flapper. It's not meant to be a comprehensive answer to unit testing challenges. On Thu, Dec 27, 2012 at 11:49 AM, Jonathan Hsieh <[EMAIL PROTECTED]> wrote: > Similar to Enis's comments -- it is not just the new tests running N > times -- it is that sometimes they leave junk behind that pollutes > other tests. I'm currently having similar problems on the > trunk+snapshot branch. > > I've also been running trunk+snapshot and there are some tests that > seem test broken there. (I had started on some of them a while back, > probably time to get back to them). > > Jon. > > On Wed, Dec 26, 2012 at 8:05 PM, Andrew Purtell <[EMAIL PROTECTED]> > wrote: > > Hmm... How about just adding to the contributor section that new tests > > should run reliably N times locally. N=10? N=20? N=100? > > > > > > On Wed, Dec 26, 2012 at 12:02 PM, Enis Söztutar <[EMAIL PROTECTED]> > wrote: > > > >> Just a reference of some of the recent efforts that went in: > >> HBASE-7432 TestHBaseFsck prevents testsuite from finishing > >> HBASE-7431 TestSplitTransactionOnCluster tests still flaky > >> HBASE-7417 Test patch, hopefully fixes TestReplication > >> HBASE-7421 TestHFileCleaner->testHFileCleaning has an aggressive > >> timeout > >> HBASE-7398 [0.94 UNIT TESTS] TestAssignmentManager fails frequently > on > >> CentOS 5 > >> HBASE-7338 Fix flaky condition for > >> > >> > org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange > >> HBASE-6175 TestFSUtils flaky on hdfs getFileStatus method > >> HBASE-7343 Fix flaky condition for TestDrainingServer (Himanshu) > >> HBASE-7301 Force ipv4 for unit tests > >> HBASE-7300 HbckTestingUtil needs to keep a static executor to lower > >> the number of threads used > >> HBASE-6206 Large tests fail with jdk1.7 > >> HBASE-7252 TestSizeBasedThrottler fails occasionally > >> HBASE-7235 TestMasterObserver is flaky > >> HBASE-7172 TestSplitLogManager.testVanishingTaskZNode() fails when > run > >> individually and is flaky > >> HBASE-7177 TestZooKeeperScanPolicyObserver.testScanPolicyObserver is > >> flaky > >> HBASE-7166 TestSplitTransactionOnCluster tests are flaky > >> HBASE-7165 TestSplitLogManager.testUnassignedTimeout is flaky > >> HBASE-5984 TestLogRolling.testLogRollOnPipelineRestart failed with > >> HADOOP 2.0.0 > >> HBASE-7142 TestSplitLogManager#testDeadWorker may fail because of > hard > >> limit on the TimeoutMonitor's timeout period (Himanshu) > >> HBASE-7143 TestMetaMigrationRemovingHTD fails when used with Hadoop > >> 0.23/2.x (Andrey Klochlov) > >> HBASE-6958 TestAssignmentManager sometimes fails > >> HBASE-6305 TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds. > >> (Himanshu) > >> HBASE-6796 ADDENDUM, remove spurious time limit from > testHFileCleaning > >> HBASE-6852, REVERT again, due to unexplained test failures that only > >> occur on the jenkins machines > >> HBASE-7077 ADDENDUM, add TestCategory > >> HBASE-6733 TestReplication.queueFailover occasionally fails [Part-2] > >> HBASE-6906 TestHBaseFsck#testQuarantine* tests are flakey due to > >> TestNotEnabledException > >> HBASE-6784 TestCoprocessorScanPolicy is sometimes flaky when run > >> locally > >> HBASE-6714 TestMultiSlaveReplication#testMultiSlaveReplication may > fail > >> HBASE-6715 TestFromClientSide.testCacheOnWriteEvictOnClose is flaky > >> > >> > >> Please keep these in mind, when you are writing a new test. > >> Enis > >> > >> > >> On Wed, Dec 26, 2012 at 10:03 AM, Stack <[EMAIL PROTECTED]> wrote: > >> > >> > I just added a section to the 'contributing' section on committers > being > >> > responsible for ensuring contributor's patches do not break build or > >> tests. > >> > St.Ack > >> > > >> > > >> > On Wed, Dec 26, 2012 at 9:08 AM, Stack <[EMAIL PROTECTED]> wrote: > Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) +
Andrew Purtell 2012-12-27, 21:03
-
Re: 0.94 tests back in shape and some guidelineslars hofhansl 2012-12-27, 06:37
Generally I prefer to leave it up to good judgment as much as possibel rather than making hard rules.
There might be simple unittests that do not race. I'm happy with a best effort approach. If a new test passes locally "a few times" and then passes on jenkins, or if it passes HadoopQA, that seems enough. If it turns out to be flaky later we'll have to deal with that. To make it easier we could also check in a script that runs a given test N times. If we make it too onerous we'll see fewer contributions especially in the test area. :) Another question I have: What do people think about a 0.94 HadoopQA? 0.94 and trunk are sufficiently different in many areas to maybe warrant that. Not sure how this would work, though. How can we determine whether an attached patch file is for 0.94 or 0.96? Maybe by naming convention? -- Lars ________________________________ From: Andrew Purtell <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Cc: lars hofhansl <[EMAIL PROTECTED]> Sent: Wednesday, December 26, 2012 8:05 PM Subject: Re: 0.94 tests back in shape and some guidelines Hmm... How about just adding to the contributor section that new tests should run reliably N times locally. N=10? N=20? N=100? On Wed, Dec 26, 2012 at 12:02 PM, Enis Söztutar <[EMAIL PROTECTED]> wrote: > Just a reference of some of the recent efforts that went in: > HBASE-7432 TestHBaseFsck prevents testsuite from finishing > HBASE-7431 TestSplitTransactionOnCluster tests still flaky > HBASE-7417 Test patch, hopefully fixes TestReplication > HBASE-7421 TestHFileCleaner->testHFileCleaning has an aggressive > timeout > HBASE-7398 [0.94 UNIT TESTS] TestAssignmentManager fails frequently on > CentOS 5 > HBASE-7338 Fix flaky condition for > > org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange > HBASE-6175 TestFSUtils flaky on hdfs getFileStatus method > HBASE-7343 Fix flaky condition for TestDrainingServer (Himanshu) > HBASE-7301 Force ipv4 for unit tests > HBASE-7300 HbckTestingUtil needs to keep a static executor to lower > the number of threads used > HBASE-6206 Large tests fail with jdk1.7 > HBASE-7252 TestSizeBasedThrottler fails occasionally > HBASE-7235 TestMasterObserver is flaky > HBASE-7172 TestSplitLogManager.testVanishingTaskZNode() fails when run > individually and is flaky > HBASE-7177 TestZooKeeperScanPolicyObserver.testScanPolicyObserver is > flaky > HBASE-7166 TestSplitTransactionOnCluster tests are flaky > HBASE-7165 TestSplitLogManager.testUnassignedTimeout is flaky > HBASE-5984 TestLogRolling.testLogRollOnPipelineRestart failed with > HADOOP 2.0.0 > HBASE-7142 TestSplitLogManager#testDeadWorker may fail because of hard > limit on the TimeoutMonitor's timeout period (Himanshu) > HBASE-7143 TestMetaMigrationRemovingHTD fails when used with Hadoop > 0.23/2.x (Andrey Klochlov) > HBASE-6958 TestAssignmentManager sometimes fails > HBASE-6305 TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds. > (Himanshu) > HBASE-6796 ADDENDUM, remove spurious time limit from testHFileCleaning > HBASE-6852, REVERT again, due to unexplained test failures that only > occur on the jenkins machines > HBASE-7077 ADDENDUM, add TestCategory > HBASE-6733 TestReplication.queueFailover occasionally fails [Part-2] > HBASE-6906 TestHBaseFsck#testQuarantine* tests are flakey due to > TestNotEnabledException > HBASE-6784 TestCoprocessorScanPolicy is sometimes flaky when run > locally > HBASE-6714 TestMultiSlaveReplication#testMultiSlaveReplication may fail > HBASE-6715 TestFromClientSide.testCacheOnWriteEvictOnClose is flaky > > > Please keep these in mind, when you are writing a new test. > Enis > > > On Wed, Dec 26, 2012 at 10:03 AM, Stack <[EMAIL PROTECTED]> wrote: > > > I just added a section to the 'contributing' section on committers being > > responsible for ensuring contributor's patches do not break build or Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) +
lars hofhansl 2012-12-27, 06:37
-
Re: 0.94 tests back in shape and some guidelinesEnis Söztutar 2012-12-27, 19:26
I think it is not a matter of running the tests N times, but more so on
running different platforms. From our builds, what we see most often is that the test runs just fine under CentOS 6, but becomes more flaky under CentOS 5 possibly b/c of thread scheduling differences. Moreover, under windows, the threads are not immediately scheduled to run after start() which causes further race conditions which does not occur so frequently under *nix systems. For 0.94 QA, theoretically we should not this. However in practice I see that if there is a brave soul to work on it, we will find it useful. Enis On Wed, Dec 26, 2012 at 10:37 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > If we make it too onerous we'll see fewer contributions especially in the > test area. :) > +
Enis Söztutar 2012-12-27, 19:26
-
Re: 0.94 tests back in shape and some guidelinesTed Yu 2012-12-29, 00:28
Since we don't have Hadoop QA for 0.94 patches yet, does it make sense for
either contributor (patch owner) or the committer who plans to integrate the patch to present test suite result before integration ? There is subtle difference between 0.94 and trunk which may lead to unexpected results. Cheers On Thu, Dec 27, 2012 at 11:26 AM, Enis Söztutar <[EMAIL PROTECTED]> wrote: > I think it is not a matter of running the tests N times, but more so on > running different platforms. From our builds, what we see most often is > that the test runs just fine under CentOS 6, but becomes more flaky under > CentOS 5 possibly b/c of thread scheduling differences. Moreover, under > windows, the threads are not immediately scheduled to run after start() > which causes further race conditions which does not occur so frequently > under *nix systems. > > For 0.94 QA, theoretically we should not this. However in practice I see > that if there is a brave soul to work on it, we will find it useful. > > Enis > > On Wed, Dec 26, 2012 at 10:37 PM, lars hofhansl <[EMAIL PROTECTED]> > wrote: > > > If we make it too onerous we'll see fewer contributions especially in the > > test area. :) > > > +
Ted Yu 2012-12-29, 00:28
-
Re: 0.94 tests back in shape and some guidelineslars hofhansl 2012-12-29, 00:34
I feel that would be overkill. Test suite runs for a long time. Personally I hate to run them locally and have my machine taken over for an hour.
If is passed HadoopQA in trunk and a few relevant tests in 0.94 were run (if applicable) I think that is good enough. If it doesn't pass the 0.94 run we'll find out soon enough. If the test suite is stable that is :) The gatekeeper are the release tests, not HadoopQA... IMHO. -- Lars ________________________________ From: Ted Yu <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: lars hofhansl <[EMAIL PROTECTED]> Sent: Friday, December 28, 2012 4:28 PM Subject: Re: 0.94 tests back in shape and some guidelines Since we don't have Hadoop QA for 0.94 patches yet, does it make sense for either contributor (patch owner) or the committer who plans to integrate the patch to present test suite result before integration ? There is subtle difference between 0.94 and trunk which may lead to unexpected results. Cheers On Thu, Dec 27, 2012 at 11:26 AM, Enis Söztutar <[EMAIL PROTECTED]> wrote: > I think it is not a matter of running the tests N times, but more so on > running different platforms. From our builds, what we see most often is > that the test runs just fine under CentOS 6, but becomes more flaky under > CentOS 5 possibly b/c of thread scheduling differences. Moreover, under > windows, the threads are not immediately scheduled to run after start() > which causes further race conditions which does not occur so frequently > under *nix systems. > > For 0.94 QA, theoretically we should not this. However in practice I see > that if there is a brave soul to work on it, we will find it useful. > > Enis > > On Wed, Dec 26, 2012 at 10:37 PM, lars hofhansl <[EMAIL PROTECTED]> > wrote: > > > If we make it too onerous we'll see fewer contributions especially in the > > test area. :) > > > +
lars hofhansl 2012-12-29, 00:34
-
Re: 0.94 tests back in shape and some guidelinesJesse Yates 2012-12-29, 19:45
Do we need to add a maven profile that runs the tests locally exactly as we
do them up on the jenkins machines? It would help give some confidence to the people running the tests that its exactly the same (personally, I find it annoying to have to go look it up on the build machines to do an exact match). Something like a -P jenkins? It would also put source control on how we run the test/build CI. Happy do put up a quick patch, if people are interested. -Jesse ------------------- Jesse Yates @jesse_yates jyates.github.com On Fri, Dec 28, 2012 at 4:34 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > I feel that would be overkill. Test suite runs for a long time. Personally > I hate to run them locally and have my machine taken over for an hour. > > If is passed HadoopQA in trunk and a few relevant tests in 0.94 were run > (if applicable) I think that is good enough. > > If it doesn't pass the 0.94 run we'll find out soon enough. If the test > suite is stable that is :) > > > The gatekeeper are the release tests, not HadoopQA... IMHO. > > > -- Lars > > > > ________________________________ > From: Ted Yu <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Cc: lars hofhansl <[EMAIL PROTECTED]> > Sent: Friday, December 28, 2012 4:28 PM > Subject: Re: 0.94 tests back in shape and some guidelines > > Since we don't have Hadoop QA for 0.94 patches yet, does it make sense for > either contributor (patch owner) or the committer who plans to integrate > the patch to present test suite result before integration ? > > There is subtle difference between 0.94 and trunk which may lead to > unexpected results. > > Cheers > > On Thu, Dec 27, 2012 at 11:26 AM, Enis Söztutar <[EMAIL PROTECTED]> > wrote: > > > I think it is not a matter of running the tests N times, but more so on > > running different platforms. From our builds, what we see most often is > > that the test runs just fine under CentOS 6, but becomes more flaky under > > CentOS 5 possibly b/c of thread scheduling differences. Moreover, under > > windows, the threads are not immediately scheduled to run after start() > > which causes further race conditions which does not occur so frequently > > under *nix systems. > > > > For 0.94 QA, theoretically we should not this. However in practice I see > > that if there is a brave soul to work on it, we will find it useful. > > > > Enis > > > > On Wed, Dec 26, 2012 at 10:37 PM, lars hofhansl <[EMAIL PROTECTED]> > > wrote: > > > > > If we make it too onerous we'll see fewer contributions especially in > the > > > test area. :) > > > > > > +
Jesse Yates 2012-12-29, 19:45
-
Re: 0.94 tests back in shape and some guidelinesAndrew Purtell 2012-12-31, 18:34
Was going to spend a bit more vacation time on this, but the latest 0.94 is
now building successfully again in my local environment. Thanks so much Lars, Jimmy, and Enis, those recent test fix commits have helped a lot. On Sat, Dec 29, 2012 at 11:45 AM, Jesse Yates <[EMAIL PROTECTED]>wrote: > Do we need to add a maven profile that runs the tests locally exactly as we > do them up on the jenkins machines? It would help give some confidence to > the people running the tests that its exactly the same (personally, I find > it annoying to have to go look it up on the build machines to do an exact > match). > > Something like a -P jenkins? It would also put source control on how we run > the test/build CI. > > Happy do put up a quick patch, if people are interested. > > -Jesse > ------------------- > Jesse Yates > @jesse_yates > jyates.github.com > > > On Fri, Dec 28, 2012 at 4:34 PM, lars hofhansl <[EMAIL PROTECTED]> > wrote: > > > I feel that would be overkill. Test suite runs for a long time. > Personally > > I hate to run them locally and have my machine taken over for an hour. > > > > If is passed HadoopQA in trunk and a few relevant tests in 0.94 were run > > (if applicable) I think that is good enough. > > > > If it doesn't pass the 0.94 run we'll find out soon enough. If the test > > suite is stable that is :) > > > > > > The gatekeeper are the release tests, not HadoopQA... IMHO. > > > > > > -- Lars > > > > > > > > ________________________________ > > From: Ted Yu <[EMAIL PROTECTED]> > > To: [EMAIL PROTECTED] > > Cc: lars hofhansl <[EMAIL PROTECTED]> > > Sent: Friday, December 28, 2012 4:28 PM > > Subject: Re: 0.94 tests back in shape and some guidelines > > > > Since we don't have Hadoop QA for 0.94 patches yet, does it make sense > for > > either contributor (patch owner) or the committer who plans to integrate > > the patch to present test suite result before integration ? > > > > There is subtle difference between 0.94 and trunk which may lead to > > unexpected results. > > > > Cheers > > > > On Thu, Dec 27, 2012 at 11:26 AM, Enis Söztutar <[EMAIL PROTECTED]> > > wrote: > > > > > I think it is not a matter of running the tests N times, but more so on > > > running different platforms. From our builds, what we see most often is > > > that the test runs just fine under CentOS 6, but becomes more flaky > under > > > CentOS 5 possibly b/c of thread scheduling differences. Moreover, under > > > windows, the threads are not immediately scheduled to run after start() > > > which causes further race conditions which does not occur so frequently > > > under *nix systems. > > > > > > For 0.94 QA, theoretically we should not this. However in practice I > see > > > that if there is a brave soul to work on it, we will find it useful. > > > > > > Enis > > > > > > On Wed, Dec 26, 2012 at 10:37 PM, lars hofhansl <[EMAIL PROTECTED]> > > > wrote: > > > > > > > If we make it too onerous we'll see fewer contributions especially in > > the > > > > test area. :) > > > > > > > > > > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) +
Andrew Purtell 2012-12-31, 18:34
-
Re: 0.94 tests back in shape and some guidelinesAndrew Purtell 2012-12-29, 00:33
I wouldn't mind posting test results on the resolution comment of a 0.94
commit, but there have a couple of times where all tests have passed for me locally but then failed up on ASF Jenkins. One dev box may not be much like another. I think the best option is to set up ASF Jenkins for what you think we need, Ted. On Friday, December 28, 2012, Ted Yu wrote: > Since we don't have Hadoop QA for 0.94 patches yet, does it make sense for > either contributor (patch owner) or the committer who plans to integrate > the patch to present test suite result before integration ? > > There is subtle difference between 0.94 and trunk which may lead to > unexpected results. > > Cheers > > On Thu, Dec 27, 2012 at 11:26 AM, Enis Söztutar <[EMAIL PROTECTED]<javascript:;>> > wrote: > > > I think it is not a matter of running the tests N times, but more so on > > running different platforms. From our builds, what we see most often is > > that the test runs just fine under CentOS 6, but becomes more flaky under > > CentOS 5 possibly b/c of thread scheduling differences. Moreover, under > > windows, the threads are not immediately scheduled to run after start() > > which causes further race conditions which does not occur so frequently > > under *nix systems. > > > > For 0.94 QA, theoretically we should not this. However in practice I see > > that if there is a brave soul to work on it, we will find it useful. > > > > Enis > > > > On Wed, Dec 26, 2012 at 10:37 PM, lars hofhansl <[EMAIL PROTECTED]<javascript:;> > > > > wrote: > > > > > If we make it too onerous we'll see fewer contributions especially in > the > > > test area. :) > > > > > > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) +
Andrew Purtell 2012-12-29, 00:33
-
Re: 0.94 tests back in shape and some guidelinesLars Hofhansl 2012-12-26, 16:47
Agreed. What do you think the backup plan should be?
I think we can request another run to get confidence in the change. What if that fails in the same way? What we should avoid is piling on new changes while the test suite is failing. That's what happened with the 0.94 test suite. -- Lars Ted Yu <[EMAIL PROTECTED]> wrote: >I think we should come up with some backup plan in case Jenkins comes to a >halt due to IO exceptions. The following was one example: >https://builds.apache.org/job/PreCommit-HBASE-Build/3696/console > >On Tue, Dec 25, 2012 at 5:29 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > >> Thank you so much for doing this Lars. >> >> > I will retroactively -1 the change and revert it, and then shame you >> again. :) >> >> +1 >> >> On Tuesday, December 25, 2012, lars hofhansl wrote: >> >> > During the past few days I spend some time to bring the 0.94 test back >> > into shape. >> > >> > GC issues, bad backports, hanging tests, memory issues, you name it. >> > I do not want to ever have to do that again. >> > >> > The good news is: The 0.94 tests are back in shape now. Yeah! >> > >> > If you commit a patch it is your responsibility to make sure it passes >> the >> > test suite. >> > Either the tests should be fixed in a reasonable amount of time or the >> > commit should be reverted. >> > This is mainly for committers, contributors should also watch the test >> > runs for their patches. >> > No excuses. The tests are passing now. >> > I do not care whether a test passes locally, or whether it fails rarely, >> > or whether some tests failed previously, or whatever. >> > >> > Please, consider this a condition for me to continue as release manager >> > for 0.94. >> > (This is only for the 0.94 tests. I cannot speak for HadoopQA, or the >> > regular trunk test suite, although eventually I assume we want similar >> > guidelines there) >> > >> > I increased the retention time for past builds. I will find you :) >> > I will publicly shame you. I will retroactively -1 the change and revert >> > it, and then shame you again. :) >> > >> > Lastly, this is a function of the large amount of contributed patches. So >> > it is a good problem to have. >> > HBase it an actively maintained project and we certainly want to keep it >> > this way, just with an acknoledgement that keeping the test suite passing >> > is important. >> > >> > Thanks and Merry Christmas (to whoever celebrates that). >> > >> > -- Lars >> >> >> >> -- >> Best regards, >> >> - Andy >> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White) >> +
Lars Hofhansl 2012-12-26, 16:47
|