|
Ted Yu
2011-09-24, 10:51
Andrew Purtell
2011-09-24, 16:13
Ramakrishna S Vasudevan 0...
2011-09-24, 16:16
Ted Yu
2011-09-24, 16:31
Andrew Purtell
2011-09-24, 16:42
Gary Helmling
2011-09-24, 19:03
Ted Yu
2011-09-24, 19:56
Ted Yu
2011-09-25, 14:45
Ted Yu
2011-09-25, 20:41
lars hofhansl
2011-09-25, 21:27
Gaojinchao
2011-09-26, 01:34
Ted Yu
2011-09-26, 09:08
Jesse Yates
2011-09-26, 17:16
Ted Yu
2011-09-26, 17:26
Jonathan Hsieh
2011-09-26, 17:44
lars hofhansl
2011-09-26, 17:45
Ramakrishna S Vasudevan 0...
2011-09-26, 18:31
Ted Yu
2011-09-26, 18:35
lars hofhansl
2011-09-26, 18:37
Ted Yu
2011-09-26, 19:57
Andrew Purtell
2011-09-26, 22:19
Ramkrishna S Vasudevan
2011-09-27, 04:02
Jonathan Hsieh
2011-09-27, 17:15
|
-
maintaining stable HBase buildTed Yu 2011-09-24, 10:51
Hi,
I want to bring the importance of maintaining stable HBase build to our attention. A stable HBase build is important, not just for the next release but also for authors of the pending patches to verify the correctness of their work. At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds were all blue. Now they're all red. I don't mind fixing Jenkins build. But if we collectively adopt some good practice, it would be easier to achieve the goal of having stable builds. For contributors, I understand that it takes so much time to run whole test suite that he/she may not have the luxury of doing this - Apache Jenkins wouldn't do it when you press Submit Patch button. If this is the case (let's call it scenario A), please use Eclipse (or other tool) to identify tests that exercise the classes/methods in your patch and run them. Also clearly state what tests you ran in the JIRA. If you have a Linux box where you can run whole test suite, it would be nice to utilize such resource and run whole suite. Then please state this fact on the JIRA as well. Considering Todd's suggestion of holding off commit for 24 hours after code review, 2 hour test run isn't that long. Sometimes you may see the following (from 0.92 build 18): Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21 [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 1:51:41.797s You should examine the test summary above these lines and find out which test(s) hung. For this case it was TestMasterFailover: Running org.apache.hadoop.hbase.master.TestMasterFailover Running org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTable Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec I think a script should be developed that parses test output and identify hanging test(s). For scenario A, I hope committer would run test suite. The net effect would be a statement on the JIRA, saying all tests passed. Your comments/suggestions are welcome.
-
Re: maintaining stable HBase buildAndrew Purtell 2011-09-24, 16:13
+1
This: >>> > For contributors, I understand that it takes so much time to run whole test > suite that he/she may not have the luxury of doing this - Apache Jenkins > wouldn't do it when you press Submit Patch button. > If this is the case (let's call it scenario A), please use Eclipse (or other > tool) to identify tests that exercise the classes/methods in your patch and > run them. Also clearly state what tests you ran in the JIRA. <<< and >>> > For scenario A, I hope committer would run test suite. <<< should be added to the How To Contribute page, IMHO. I see that HBASE-4014 went in -- which is important, so let's fix it and try again -- and then went right out again, reverted after 35 minutes. It should never have gone in if only to be reverted 35 minutes later. (What happened?) Scrolling down the commit history for trunk further, is a series of half-commits, addendums, reverts, reverts of reverts, etc. It has recently become difficult to cherry pick any single commit from trunk andget all of the necessary parts of a change together or have any assurance the change is not toxic. This is not just a maintainer issue -- diffing the full extent of a change to understand it fully mixes in unrelated changes between the initial commit and addendums, unless one resorts to octopus like contortions with git. So what is the solution? Submitted for your consideration: Committers should apply a candidate change and run the full test suite before committing the change to trunk or any branch. If applying a change to a branch, a full test suite run of the branch code should complete successfully before commit there as well. No patch is so pressing that it cannot wait for tests to finish before commit, IMO. If a test fails, the patch does not go in. If a test fails repeatedly for unrelated reasons, the test comes out and a jira to fix it gets opened. Finally, I can see where people are trying to fix the build, so please exclude those commits from my complaint here, that is not part of the problem. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) ----- Original Message ----- > From: Ted Yu <[EMAIL PROTECTED]> > To: dev@hbase.apache.org > Cc: > Sent: Saturday, September 24, 2011 3:51 AM > Subject: maintaining stable HBase build > > Hi, > I want to bring the importance of maintaining stable HBase build to our > attention. > A stable HBase build is important, not just for the next release but also > for authors of the pending patches to verify the correctness of their work. > > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds were all > blue. Now they're all red. > > I don't mind fixing Jenkins build. But if we collectively adopt some good > practice, it would be easier to achieve the goal of having stable builds. > > For contributors, I understand that it takes so much time to run whole test > suite that he/she may not have the luxury of doing this - Apache Jenkins > wouldn't do it when you press Submit Patch button. > If this is the case (let's call it scenario A), please use Eclipse (or other > tool) to identify tests that exercise the classes/methods in your patch and > run them. Also clearly state what tests you ran in the JIRA. > > If you have a Linux box where you can run whole test suite, it would be nice > to utilize such resource and run whole suite. Then please state this fact on > the JIRA as well. > Considering Todd's suggestion of holding off commit for 24 hours after code > review, 2 hour test run isn't that long. > > Sometimes you may see the following (from 0.92 build 18): > > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21 > > [INFO] ------------------------------------------------------------------------ > [INFO] BUILD FAILURE > [INFO] ------------------------------------------------------------------------ > [INFO] Total time: 1:51:41.797s > > You should examine the test summary above these lines and find out
-
Re: maintaining stable HBase buildRamakrishna S Vasudevan 0... 2011-09-24, 16:16
Hi
Ted, I agree with you. Pasting the testcase results in JIRA is also fine, mainly when there are some testcase failures when we run locally but if we feel it is not due to the fix we have added we can mention that also. I think rather than in a windows machine its better to run in linux box. +1 for your suggestion Ted. Can we add the feature like in HDFS when we submit patch automatically the Jenkin's run the testcases? Atleast till this is done I go with your suggestion. Regards Ram ----- Original Message ----- From: Ted Yu <[EMAIL PROTECTED]> Date: Saturday, September 24, 2011 4:22 pm Subject: maintaining stable HBase build To: dev@hbase.apache.org > Hi, > I want to bring the importance of maintaining stable HBase build to > ourattention. > A stable HBase build is important, not just for the next release > but also > for authors of the pending patches to verify the correctness of > their work. > > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds > were all > blue. Now they're all red. > > I don't mind fixing Jenkins build. But if we collectively adopt > some good > practice, it would be easier to achieve the goal of having stable > builds. > For contributors, I understand that it takes so much time to run > whole test > suite that he/she may not have the luxury of doing this - Apache > Jenkinswouldn't do it when you press Submit Patch button. > If this is the case (let's call it scenario A), please use Eclipse > (or other > tool) to identify tests that exercise the classes/methods in your > patch and > run them. Also clearly state what tests you ran in the JIRA. > > If you have a Linux box where you can run whole test suite, it > would be nice > to utilize such resource and run whole suite. Then please state > this fact on > the JIRA as well. > Considering Todd's suggestion of holding off commit for 24 hours > after code > review, 2 hour test run isn't that long. > > Sometimes you may see the following (from 0.92 build 18): > > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21 > > [INFO] ------------------------------------------------------------- > ----------- > [INFO] BUILD FAILURE > [INFO] ------------------------------------------------------------- > ----------- > [INFO] Total time: 1:51:41.797s > > You should examine the test summary above these lines and find out > which test(s) hung. For this case it was TestMasterFailover: > > Running org.apache.hadoop.hbase.master.TestMasterFailover > Running > org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTableTests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec > > I think a script should be developed that parses test output and > identify hanging test(s). > > For scenario A, I hope committer would run test suite. > The net effect would be a statement on the JIRA, saying all tests > passed. > Your comments/suggestions are welcome. >
-
Re: maintaining stable HBase buildTed Yu 2011-09-24, 16:31
>> It should never have gone in if only to be reverted 35 minutes later.
(What happened?) Since both Gary and Eugene have been working on HBASE-4014 for quite some time, I didn't initially question the test cases. After integrating the patch for TRUNK, I discovered that TestRegionServerCoprocessorExceptionWithAbort failed consistently on Mac and Linux. So I backed it out. I first thought of disabling this particular test but later abandoned that idea - if a core test fails, this means the feature may have issue. I notified Eugene immediately and he will take a look today. >> Scrolling down the commit history for trunk further, is a series of half-commits, addendums, reverts, reverts of reverts, etc. If you were talking about HBASE-4132<https://issues.apache.org/jira/browse/HBASE-4132>, I initially tried to salvage the JIRA by adjusting the triggering assertion. However, that turned out to be not so trivial. So I reopened the JIRA. Just FYI On Sat, Sep 24, 2011 at 9:13 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > +1 > > This: > >>> > > For contributors, I understand that it takes so much time to run whole > test > > suite that he/she may not have the luxury of doing this - Apache Jenkins > > wouldn't do it when you press Submit Patch button. > > If this is the case (let's call it scenario A), please use Eclipse (or > other > > tool) to identify tests that exercise the classes/methods in your patch > and > > run them. Also clearly state what tests you ran in the JIRA. > <<< > > and > > >>> > > For scenario A, I hope committer would run test suite. > > <<< > > > should be added to the How To Contribute page, IMHO. > > > I see that HBASE-4014 went in -- which is important, so let's fix it and > try again -- and then went right out again, reverted after 35 minutes. It > should never have gone in if only to be reverted 35 minutes later. (What > happened?) Scrolling down the commit history for trunk further, is a series > of half-commits, addendums, reverts, reverts of reverts, etc. > > It has recently become difficult to cherry pick any single commit from > trunk andget all of the necessary parts of a change together or have any > assurance the change is not toxic. This is not just a maintainer issue -- > diffing the full extent of a change to understand it fully mixes in > unrelated changes between the initial commit and addendums, unless one > resorts to octopus like contortions with git. > > > So what is the solution? Submitted for your consideration: > > > Committers should apply a candidate change and run the full test suite > before committing the change to trunk or any branch. If applying a change to > a branch, a full test suite run of the branch code should complete > successfully before commit there as well. > > No patch is so pressing that it cannot wait for tests to finish before > commit, IMO. > > If a test fails, the patch does not go in. > > If a test fails repeatedly for unrelated reasons, the test comes out and a > jira to fix it gets opened. > > Finally, I can see where people are trying to fix the build, so please > exclude > those commits from my complaint here, that is not part of the problem. > Best regards, > > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > > > ----- Original Message ----- > > From: Ted Yu <[EMAIL PROTECTED]> > > To: dev@hbase.apache.org > > Cc: > > Sent: Saturday, September 24, 2011 3:51 AM > > Subject: maintaining stable HBase build > > > > Hi, > > I want to bring the importance of maintaining stable HBase build to our > > attention. > > A stable HBase build is important, not just for the next release but also > > for authors of the pending patches to verify the correctness of their > work. > > > > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds were all > > blue. Now they're all red. > > > > I don't mind fixing Jenkins build. But if we collectively adopt some good
-
Re: maintaining stable HBase buildAndrew Purtell 2011-09-24, 16:42
Thanks Ted.
> Since both Gary and Eugene have been working on HBASE-4014 for quite some > time, I didn't initially question the test cases. This is understandable but I think we should just not have this kind of trust. :-) I've been burned by committing something that I thought was fine due to the submitter before too. You can never know. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) ----- Original Message ----- > From: Ted Yu <[EMAIL PROTECTED]> > To: dev@hbase.apache.org; Andrew Purtell <[EMAIL PROTECTED]> > Cc: > Sent: Saturday, September 24, 2011 9:31 AM > Subject: Re: maintaining stable HBase build > >>> It should never have gone in if only to be reverted 35 minutes later. > (What happened?) > > Since both Gary and Eugene have been working on HBASE-4014 for quite some > time, I didn't initially question the test cases. > After integrating the patch for TRUNK, I discovered that > TestRegionServerCoprocessorExceptionWithAbort failed consistently on Mac and > Linux. So I backed it out. > I first thought of disabling this particular test but later abandoned that > idea - if a core test fails, this means the feature may have issue. > I notified Eugene immediately and he will take a look today. > >>> Scrolling down the commit history for trunk further, is a series of > half-commits, addendums, reverts, reverts of reverts, etc. > > If you were talking about > HBASE-4132<https://issues.apache.org/jira/browse/HBASE-4132>, > I initially tried to salvage the JIRA by adjusting the triggering assertion. > However, that turned out to be not so trivial. So I reopened the JIRA. > > Just FYI > > On Sat, Sep 24, 2011 at 9:13 AM, Andrew Purtell <[EMAIL PROTECTED]> > wrote: > >> +1 >> >> This: >> >>> >> > For contributors, I understand that it takes so much time to run whole >> test >> > suite that he/she may not have the luxury of doing this - Apache > Jenkins >> > wouldn't do it when you press Submit Patch button. >> > If this is the case (let's call it scenario A), please use Eclipse > (or >> other >> > tool) to identify tests that exercise the classes/methods in your > patch >> and >> > run them. Also clearly state what tests you ran in the JIRA. >> <<< >> >> and >> >> >>> >> > For scenario A, I hope committer would run test suite. >> >> <<< >> >> >> should be added to the How To Contribute page, IMHO. >> >> >> I see that HBASE-4014 went in -- which is important, so let's fix it > and >> try again -- and then went right out again, reverted after 35 minutes. It >> should never have gone in if only to be reverted 35 minutes later. (What >> happened?) Scrolling down the commit history for trunk further, is a series >> of half-commits, addendums, reverts, reverts of reverts, etc. >> >> It has recently become difficult to cherry pick any single commit from >> trunk andget all of the necessary parts of a change together or have any >> assurance the change is not toxic. This is not just a maintainer issue -- >> diffing the full extent of a change to understand it fully mixes in >> unrelated changes between the initial commit and addendums, unless one >> resorts to octopus like contortions with git. >> >> >> So what is the solution? Submitted for your consideration: >> >> >> Committers should apply a candidate change and run the full test suite >> before committing the change to trunk or any branch. If applying a change > to >> a branch, a full test suite run of the branch code should complete >> successfully before commit there as well. >> >> No patch is so pressing that it cannot wait for tests to finish before >> commit, IMO. >> >> If a test fails, the patch does not go in. >> >> If a test fails repeatedly for unrelated reasons, the test comes out and a >> jira to fix it gets opened. >> >> Finally, I can see where people are trying to fix the build, so please
-
Re: maintaining stable HBase buildGary Helmling 2011-09-24, 19:03
> Since both Gary and Eugene have been working on HBASE-4014 for quite some
> time, I didn't initially question the test cases. > After integrating the patch for TRUNK, I discovered that > TestRegionServerCoprocessorExceptionWithAbort failed consistently on Mac > and > Linux. So I backed it out. > I first thought of disabling this particular test but later abandoned that > idea - if a core test fails, this means the feature may have issue. > I notified Eugene immediately and he will take a look today. > > Ted, I did say that I would commit this change. But I was still in the process of verifying the tests, so I was a bit surprised to see that it had been committed. Running the tests had already uncovered one issue (HBASE-4472). I understand that maybe I'm taking longer than some might like -- tests do take a long time to run and I was traveling yesterday. I do appreciate your follow up, but don't see the need for this patch to have been rushed. And seconding Andy's thought, don't take my word for it working! :) I was contingent on tests passing, which I still had yet to confirm. Sorry if I wasn't clear on that. I'm happy to see the effort going in to improving our test situation, both speeding up our current tests and separating out test groups. Props to all who have been contributing to that. Anything we can do to streamline the patch verification process will make it easier for all to follow it.
-
Re: maintaining stable HBase buildTed Yu 2011-09-24, 19:56
Gary:
From your comment in the jira on Sept 23rd, it wasn't clear that you were running test suite. Since I have been involved in the review of 4014, I took the action of integration which was premature. I think in the future, we should use clear language, especially in the final stages of review. We should indicate whether the +1 comes with running test suite or not. In case of multiple committers on the same JIRA (4455 was reviewed by 5 committers), the person planning on committing should indicate the intention clearly. Thanks Gary. On Sep 24, 2011, at 12:03 PM, Gary Helmling <[EMAIL PROTECTED]> wrote: >> Since both Gary and Eugene have been working on HBASE-4014 for quite some >> time, I didn't initially question the test cases. >> After integrating the patch for TRUNK, I discovered that >> TestRegionServerCoprocessorExceptionWithAbort failed consistently on Mac >> and >> Linux. So I backed it out. >> I first thought of disabling this particular test but later abandoned that >> idea - if a core test fails, this means the feature may have issue. >> I notified Eugene immediately and he will take a look today. >> >> > Ted, I did say that I would commit this change. But I was still in the > process of verifying the tests, so I was a bit surprised to see that it had > been committed. Running the tests had already uncovered one issue > (HBASE-4472). I understand that maybe I'm taking longer than some might > like -- tests do take a long time to run and I was traveling yesterday. I > do appreciate your follow up, but don't see the need for this patch to have > been rushed. > > And seconding Andy's thought, don't take my word for it working! :) I was > contingent on tests passing, which I still had yet to confirm. Sorry if I > wasn't clear on that. > > I'm happy to see the effort going in to improving our test situation, both > speeding up our current tests and separating out test groups. Props to all > who have been contributing to that. Anything we can do to streamline the > patch verification process will make it easier for all to follow it.
-
Re: maintaining stable HBase buildTed Yu 2011-09-25, 14:45
I wrote a short blog:
http://zhihongyu.blogspot.com/2011/09/streamlining-patch-submission.html It is geared towards contributors. Cheers On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan 00902313 < [EMAIL PROTECTED]> wrote: > Hi > > Ted, I agree with you. Pasting the testcase results in JIRA is also fine, > mainly when there are some testcase failures when we run locally but if we > feel it is not due to the fix we have added we can mention that also. I > think rather than in a windows machine its better to run in linux box. > > +1 for your suggestion Ted. > > Can we add the feature like in HDFS when we submit patch automatically the > Jenkin's run the testcases? > > Atleast till this is done I go with your suggestion. > > Regards > Ram > > ----- Original Message ----- > From: Ted Yu <[EMAIL PROTECTED]> > Date: Saturday, September 24, 2011 4:22 pm > Subject: maintaining stable HBase build > To: dev@hbase.apache.org > > > Hi, > > I want to bring the importance of maintaining stable HBase build to > > ourattention. > > A stable HBase build is important, not just for the next release > > but also > > for authors of the pending patches to verify the correctness of > > their work. > > > > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds > > were all > > blue. Now they're all red. > > > > I don't mind fixing Jenkins build. But if we collectively adopt > > some good > > practice, it would be easier to achieve the goal of having stable > > builds. > > For contributors, I understand that it takes so much time to run > > whole test > > suite that he/she may not have the luxury of doing this - Apache > > Jenkinswouldn't do it when you press Submit Patch button. > > If this is the case (let's call it scenario A), please use Eclipse > > (or other > > tool) to identify tests that exercise the classes/methods in your > > patch and > > run them. Also clearly state what tests you ran in the JIRA. > > > > If you have a Linux box where you can run whole test suite, it > > would be nice > > to utilize such resource and run whole suite. Then please state > > this fact on > > the JIRA as well. > > Considering Todd's suggestion of holding off commit for 24 hours > > after code > > review, 2 hour test run isn't that long. > > > > Sometimes you may see the following (from 0.92 build 18): > > > > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21 > > > > [INFO] ------------------------------------------------------------- > > ----------- > > [INFO] BUILD FAILURE > > [INFO] ------------------------------------------------------------- > > ----------- > > [INFO] Total time: 1:51:41.797s > > > > You should examine the test summary above these lines and find out > > which test(s) hung. For this case it was TestMasterFailover: > > > > Running org.apache.hadoop.hbase.master.TestMasterFailover > > Running > > org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTableTests > run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec > > > > I think a script should be developed that parses test output and > > identify hanging test(s). > > > > For scenario A, I hope committer would run test suite. > > The net effect would be a statement on the JIRA, saying all tests > > passed. > > Your comments/suggestions are welcome. > > >
-
Re: maintaining stable HBase buildTed Yu 2011-09-25, 20:41
As of 1:38 PST Sunday, the three builds all passed.
I think we have some tests that exhibit in-deterministic behavior. I suggest committers interleave patch submissions by 2 hour span so that we can more easily identify patch(es) that break the build. Thanks On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > I wrote a short blog: > http://zhihongyu.blogspot.com/2011/09/streamlining-patch-submission.html > > It is geared towards contributors. > > Cheers > > > On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan 00902313 < > [EMAIL PROTECTED]> wrote: > >> Hi >> >> Ted, I agree with you. Pasting the testcase results in JIRA is also fine, >> mainly when there are some testcase failures when we run locally but if we >> feel it is not due to the fix we have added we can mention that also. I >> think rather than in a windows machine its better to run in linux box. >> >> +1 for your suggestion Ted. >> >> Can we add the feature like in HDFS when we submit patch automatically the >> Jenkin's run the testcases? >> >> Atleast till this is done I go with your suggestion. >> >> Regards >> Ram >> >> ----- Original Message ----- >> From: Ted Yu <[EMAIL PROTECTED]> >> Date: Saturday, September 24, 2011 4:22 pm >> Subject: maintaining stable HBase build >> To: dev@hbase.apache.org >> >> > Hi, >> > I want to bring the importance of maintaining stable HBase build to >> > ourattention. >> > A stable HBase build is important, not just for the next release >> > but also >> > for authors of the pending patches to verify the correctness of >> > their work. >> > >> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds >> > were all >> > blue. Now they're all red. >> > >> > I don't mind fixing Jenkins build. But if we collectively adopt >> > some good >> > practice, it would be easier to achieve the goal of having stable >> > builds. >> > For contributors, I understand that it takes so much time to run >> > whole test >> > suite that he/she may not have the luxury of doing this - Apache >> > Jenkinswouldn't do it when you press Submit Patch button. >> > If this is the case (let's call it scenario A), please use Eclipse >> > (or other >> > tool) to identify tests that exercise the classes/methods in your >> > patch and >> > run them. Also clearly state what tests you ran in the JIRA. >> > >> > If you have a Linux box where you can run whole test suite, it >> > would be nice >> > to utilize such resource and run whole suite. Then please state >> > this fact on >> > the JIRA as well. >> > Considering Todd's suggestion of holding off commit for 24 hours >> > after code >> > review, 2 hour test run isn't that long. >> > >> > Sometimes you may see the following (from 0.92 build 18): >> > >> > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21 >> > >> > [INFO] ------------------------------------------------------------- >> > ----------- >> > [INFO] BUILD FAILURE >> > [INFO] ------------------------------------------------------------- >> > ----------- >> > [INFO] Total time: 1:51:41.797s >> > >> > You should examine the test summary above these lines and find out >> > which test(s) hung. For this case it was TestMasterFailover: >> > >> > Running org.apache.hadoop.hbase.master.TestMasterFailover >> > Running >> > org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTableTests >> run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec >> > >> > I think a script should be developed that parses test output and >> > identify hanging test(s). >> > >> > For scenario A, I hope committer would run test suite. >> > The net effect would be a statement on the JIRA, saying all tests >> > passed. >> > Your comments/suggestions are welcome. >> > >> > >
-
Re: maintaining stable HBase buildlars hofhansl 2011-09-25, 21:27
At Salesforce we call these "flappers" and they are considered almost worse than failing tests,
as they add noise to a test run without adding confidence. At test that fails once in - say - 10 runs is worthless. ________________________________ From: Ted Yu <[EMAIL PROTECTED]> To: dev@hbase.apache.org Sent: Sunday, September 25, 2011 1:41 PM Subject: Re: maintaining stable HBase build As of 1:38 PST Sunday, the three builds all passed. I think we have some tests that exhibit in-deterministic behavior. I suggest committers interleave patch submissions by 2 hour span so that we can more easily identify patch(es) that break the build. Thanks On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > I wrote a short blog: > http://zhihongyu.blogspot.com/2011/09/streamlining-patch-submission.html > > It is geared towards contributors. > > Cheers > > > On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan 00902313 < > [EMAIL PROTECTED]> wrote: > >> Hi >> >> Ted, I agree with you. Pasting the testcase results in JIRA is also fine, >> mainly when there are some testcase failures when we run locally but if we >> feel it is not due to the fix we have added we can mention that also. I >> think rather than in a windows machine its better to run in linux box. >> >> +1 for your suggestion Ted. >> >> Can we add the feature like in HDFS when we submit patch automatically the >> Jenkin's run the testcases? >> >> Atleast till this is done I go with your suggestion. >> >> Regards >> Ram >> >> ----- Original Message ----- >> From: Ted Yu <[EMAIL PROTECTED]> >> Date: Saturday, September 24, 2011 4:22 pm >> Subject: maintaining stable HBase build >> To: dev@hbase.apache.org >> >> > Hi, >> > I want to bring the importance of maintaining stable HBase build to >> > ourattention. >> > A stable HBase build is important, not just for the next release >> > but also >> > for authors of the pending patches to verify the correctness of >> > their work. >> > >> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds >> > were all >> > blue. Now they're all red. >> > >> > I don't mind fixing Jenkins build. But if we collectively adopt >> > some good >> > practice, it would be easier to achieve the goal of having stable >> > builds. >> > For contributors, I understand that it takes so much time to run >> > whole test >> > suite that he/she may not have the luxury of doing this - Apache >> > Jenkinswouldn't do it when you press Submit Patch button. >> > If this is the case (let's call it scenario A), please use Eclipse >> > (or other >> > tool) to identify tests that exercise the classes/methods in your >> > patch and >> > run them. Also clearly state what tests you ran in the JIRA. >> > >> > If you have a Linux box where you can run whole test suite, it >> > would be nice >> > to utilize such resource and run whole suite. Then please state >> > this fact on >> > the JIRA as well. >> > Considering Todd's suggestion of holding off commit for 24 hours >> > after code >> > review, 2 hour test run isn't that long. >> > >> > Sometimes you may see the following (from 0.92 build 18): >> > >> > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21 >> > >> > [INFO] ------------------------------------------------------------- >> > ----------- >> > [INFO] BUILD FAILURE >> > [INFO] ------------------------------------------------------------- >> > ----------- >> > [INFO] Total time: 1:51:41.797s >> > >> > You should examine the test summary above these lines and find out >> > which test(s) hung. For this case it was TestMasterFailover: >> > >> > Running org.apache.hadoop.hbase.master.TestMasterFailover >> > Running >> > org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTableTests >> run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec >> > >> > I think a script should be developed that parses test output and >> > identify hanging test(s).
-
Re: maintaining stable HBase buildGaojinchao 2011-09-26, 01:34
+1. We should run all test cases before submit it, Then please state this fact on
the JIRA as well. I will do it. -----邮件原件----- 发件人: Ted Yu [mailto:[EMAIL PROTECTED]] 发送时间: 2011年9月24日 18:52 收件人: dev@hbase.apache.org 主题: maintaining stable HBase build Hi, I want to bring the importance of maintaining stable HBase build to our attention. A stable HBase build is important, not just for the next release but also for authors of the pending patches to verify the correctness of their work. At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds were all blue. Now they're all red. I don't mind fixing Jenkins build. But if we collectively adopt some good practice, it would be easier to achieve the goal of having stable builds. For contributors, I understand that it takes so much time to run whole test suite that he/she may not have the luxury of doing this - Apache Jenkins wouldn't do it when you press Submit Patch button. If this is the case (let's call it scenario A), please use Eclipse (or other tool) to identify tests that exercise the classes/methods in your patch and run them. Also clearly state what tests you ran in the JIRA. If you have a Linux box where you can run whole test suite, it would be nice to utilize such resource and run whole suite. Then please state this fact on the JIRA as well. Considering Todd's suggestion of holding off commit for 24 hours after code review, 2 hour test run isn't that long. Sometimes you may see the following (from 0.92 build 18): Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21 [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 1:51:41.797s You should examine the test summary above these lines and find out which test(s) hung. For this case it was TestMasterFailover: Running org.apache.hadoop.hbase.master.TestMasterFailover Running org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTable Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec I think a script should be developed that parses test output and identify hanging test(s). For scenario A, I hope committer would run test suite. The net effect would be a statement on the JIRA, saying all tests passed. Your comments/suggestions are welcome.
-
Re: maintaining stable HBase buildTed Yu 2011-09-26, 09:08
Below is a simple script to repeatedly run a unit test.
I suggest using it or similar script on the new unit test(s) in future patches. #!/bin/bash # script to run test repeatedly # usage: ./runtest.sh <name of test> <number of repetitions> # for (( i = 1 ; i <= $2; i++ )) do nice -10 mvn test -Dtest=$1 if [ $? -ne 0 ]; then echo "$1 failed" exit 1 fi done Thanks On Sun, Sep 25, 2011 at 2:27 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > At Salesforce we call these "flappers" and they are considered almost worse > than failing tests, > as they add noise to a test run without adding confidence. > At test that fails once in - say - 10 runs is worthless. > > > > ________________________________ > From: Ted Yu <[EMAIL PROTECTED]> > To: dev@hbase.apache.org > Sent: Sunday, September 25, 2011 1:41 PM > Subject: Re: maintaining stable HBase build > > As of 1:38 PST Sunday, the three builds all passed. > > I think we have some tests that exhibit in-deterministic behavior. > > I suggest committers interleave patch submissions by 2 hour span so that we > can more easily identify patch(es) that break the build. > > Thanks > > On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > I wrote a short blog: > > http://zhihongyu.blogspot.com/2011/09/streamlining-patch-submission.html > > > > It is geared towards contributors. > > > > Cheers > > > > > > On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan 00902313 < > > [EMAIL PROTECTED]> wrote: > > > >> Hi > >> > >> Ted, I agree with you. Pasting the testcase results in JIRA is also > fine, > >> mainly when there are some testcase failures when we run locally but if > we > >> feel it is not due to the fix we have added we can mention that also. I > >> think rather than in a windows machine its better to run in linux box. > >> > >> +1 for your suggestion Ted. > >> > >> Can we add the feature like in HDFS when we submit patch automatically > the > >> Jenkin's run the testcases? > >> > >> Atleast till this is done I go with your suggestion. > >> > >> Regards > >> Ram > >> > >> ----- Original Message ----- > >> From: Ted Yu <[EMAIL PROTECTED]> > >> Date: Saturday, September 24, 2011 4:22 pm > >> Subject: maintaining stable HBase build > >> To: dev@hbase.apache.org > >> > >> > Hi, > >> > I want to bring the importance of maintaining stable HBase build to > >> > ourattention. > >> > A stable HBase build is important, not just for the next release > >> > but also > >> > for authors of the pending patches to verify the correctness of > >> > their work. > >> > > >> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds > >> > were all > >> > blue. Now they're all red. > >> > > >> > I don't mind fixing Jenkins build. But if we collectively adopt > >> > some good > >> > practice, it would be easier to achieve the goal of having stable > >> > builds. > >> > For contributors, I understand that it takes so much time to run > >> > whole test > >> > suite that he/she may not have the luxury of doing this - Apache > >> > Jenkinswouldn't do it when you press Submit Patch button. > >> > If this is the case (let's call it scenario A), please use Eclipse > >> > (or other > >> > tool) to identify tests that exercise the classes/methods in your > >> > patch and > >> > run them. Also clearly state what tests you ran in the JIRA. > >> > > >> > If you have a Linux box where you can run whole test suite, it > >> > would be nice > >> > to utilize such resource and run whole suite. Then please state > >> > this fact on > >> > the JIRA as well. > >> > Considering Todd's suggestion of holding off commit for 24 hours > >> > after code > >> > review, 2 hour test run isn't that long. > >> > > >> > Sometimes you may see the following (from 0.92 build 18): > >> > > >> > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21 > >> > > >> > [INFO] --------------------------------------------------------
-
Re: maintaining stable HBase buildJesse Yates 2011-09-26, 17:16
Ted,
There is a ticket (HBASE-4480) up for wrapping tests in a retry script for failed tests (though no work has been done on it yet). Maybe we can incorporate this script into that ticket? -Jesse Yates On Mon, Sep 26, 2011 at 2:08 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > Below is a simple script to repeatedly run a unit test. > I suggest using it or similar script on the new unit test(s) in future > patches. > > #!/bin/bash > # script to run test repeatedly > # usage: ./runtest.sh <name of test> <number of repetitions> > # > for (( i = 1 ; i <= $2; i++ )) > do > nice -10 mvn test -Dtest=$1 > if [ $? -ne 0 ]; then > echo "$1 failed" > exit 1 > fi > done > > Thanks > > On Sun, Sep 25, 2011 at 2:27 PM, lars hofhansl <[EMAIL PROTECTED]> > wrote: > > > At Salesforce we call these "flappers" and they are considered almost > worse > > than failing tests, > > as they add noise to a test run without adding confidence. > > At test that fails once in - say - 10 runs is worthless. > > > > > > > > ________________________________ > > From: Ted Yu <[EMAIL PROTECTED]> > > To: dev@hbase.apache.org > > Sent: Sunday, September 25, 2011 1:41 PM > > Subject: Re: maintaining stable HBase build > > > > As of 1:38 PST Sunday, the three builds all passed. > > > > I think we have some tests that exhibit in-deterministic behavior. > > > > I suggest committers interleave patch submissions by 2 hour span so that > we > > can more easily identify patch(es) that break the build. > > > > Thanks > > > > On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > I wrote a short blog: > > > > http://zhihongyu.blogspot.com/2011/09/streamlining-patch-submission.html > > > > > > It is geared towards contributors. > > > > > > Cheers > > > > > > > > > On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan 00902313 < > > > [EMAIL PROTECTED]> wrote: > > > > > >> Hi > > >> > > >> Ted, I agree with you. Pasting the testcase results in JIRA is also > > fine, > > >> mainly when there are some testcase failures when we run locally but > if > > we > > >> feel it is not due to the fix we have added we can mention that also. > I > > >> think rather than in a windows machine its better to run in linux box. > > >> > > >> +1 for your suggestion Ted. > > >> > > >> Can we add the feature like in HDFS when we submit patch automatically > > the > > >> Jenkin's run the testcases? > > >> > > >> Atleast till this is done I go with your suggestion. > > >> > > >> Regards > > >> Ram > > >> > > >> ----- Original Message ----- > > >> From: Ted Yu <[EMAIL PROTECTED]> > > >> Date: Saturday, September 24, 2011 4:22 pm > > >> Subject: maintaining stable HBase build > > >> To: dev@hbase.apache.org > > >> > > >> > Hi, > > >> > I want to bring the importance of maintaining stable HBase build to > > >> > ourattention. > > >> > A stable HBase build is important, not just for the next release > > >> > but also > > >> > for authors of the pending patches to verify the correctness of > > >> > their work. > > >> > > > >> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds > > >> > were all > > >> > blue. Now they're all red. > > >> > > > >> > I don't mind fixing Jenkins build. But if we collectively adopt > > >> > some good > > >> > practice, it would be easier to achieve the goal of having stable > > >> > builds. > > >> > For contributors, I understand that it takes so much time to run > > >> > whole test > > >> > suite that he/she may not have the luxury of doing this - Apache > > >> > Jenkinswouldn't do it when you press Submit Patch button. > > >> > If this is the case (let's call it scenario A), please use Eclipse > > >> > (or other > > >> > tool) to identify tests that exercise the classes/methods in your > > >> > patch and > > >> > run them. Also clearly state what tests you ran in the JIRA. > > >> > > > >> > If you have a Linux box where you can run whole test suite, it
-
Re: maintaining stable HBase buildTed Yu 2011-09-26, 17:26
That would be nice Jesse.
Thanks On Mon, Sep 26, 2011 at 10:16 AM, Jesse Yates <[EMAIL PROTECTED]>wrote: > Ted, > > There is a ticket (HBASE-4480) up for wrapping tests in a retry script for > failed tests (though no work has been done on it yet). Maybe we can > incorporate this script into that ticket? > > -Jesse Yates > > On Mon, Sep 26, 2011 at 2:08 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > Below is a simple script to repeatedly run a unit test. > > I suggest using it or similar script on the new unit test(s) in future > > patches. > > > > #!/bin/bash > > # script to run test repeatedly > > # usage: ./runtest.sh <name of test> <number of repetitions> > > # > > for (( i = 1 ; i <= $2; i++ )) > > do > > nice -10 mvn test -Dtest=$1 > > if [ $? -ne 0 ]; then > > echo "$1 failed" > > exit 1 > > fi > > done > > > > Thanks > > > > On Sun, Sep 25, 2011 at 2:27 PM, lars hofhansl <[EMAIL PROTECTED]> > > wrote: > > > > > At Salesforce we call these "flappers" and they are considered almost > > worse > > > than failing tests, > > > as they add noise to a test run without adding confidence. > > > At test that fails once in - say - 10 runs is worthless. > > > > > > > > > > > > ________________________________ > > > From: Ted Yu <[EMAIL PROTECTED]> > > > To: dev@hbase.apache.org > > > Sent: Sunday, September 25, 2011 1:41 PM > > > Subject: Re: maintaining stable HBase build > > > > > > As of 1:38 PST Sunday, the three builds all passed. > > > > > > I think we have some tests that exhibit in-deterministic behavior. > > > > > > I suggest committers interleave patch submissions by 2 hour span so > that > > we > > > can more easily identify patch(es) that break the build. > > > > > > Thanks > > > > > > On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > > > I wrote a short blog: > > > > > > http://zhihongyu.blogspot.com/2011/09/streamlining-patch-submission.html > > > > > > > > It is geared towards contributors. > > > > > > > > Cheers > > > > > > > > > > > > On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan 00902313 < > > > > [EMAIL PROTECTED]> wrote: > > > > > > > >> Hi > > > >> > > > >> Ted, I agree with you. Pasting the testcase results in JIRA is also > > > fine, > > > >> mainly when there are some testcase failures when we run locally but > > if > > > we > > > >> feel it is not due to the fix we have added we can mention that > also. > > I > > > >> think rather than in a windows machine its better to run in linux > box. > > > >> > > > >> +1 for your suggestion Ted. > > > >> > > > >> Can we add the feature like in HDFS when we submit patch > automatically > > > the > > > >> Jenkin's run the testcases? > > > >> > > > >> Atleast till this is done I go with your suggestion. > > > >> > > > >> Regards > > > >> Ram > > > >> > > > >> ----- Original Message ----- > > > >> From: Ted Yu <[EMAIL PROTECTED]> > > > >> Date: Saturday, September 24, 2011 4:22 pm > > > >> Subject: maintaining stable HBase build > > > >> To: dev@hbase.apache.org > > > >> > > > >> > Hi, > > > >> > I want to bring the importance of maintaining stable HBase build > to > > > >> > ourattention. > > > >> > A stable HBase build is important, not just for the next release > > > >> > but also > > > >> > for authors of the pending patches to verify the correctness of > > > >> > their work. > > > >> > > > > >> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds > > > >> > were all > > > >> > blue. Now they're all red. > > > >> > > > > >> > I don't mind fixing Jenkins build. But if we collectively adopt > > > >> > some good > > > >> > practice, it would be easier to achieve the goal of having stable > > > >> > builds. > > > >> > For contributors, I understand that it takes so much time to run > > > >> > whole test > > > >> > suite that he/she may not have the luxury of doing this - Apache > > > >> > Jenkinswouldn't do it when you press Submit Patch button.
-
Re: maintaining stable HBase buildJonathan Hsieh 2011-09-26, 17:44
I've been hunting some flaky tests down as well -- a few weeks back I was
testing some changes along the line of HBASE-4326. (maybe some of these are fixed?) First, two test seemed to flake fairly frequently and were likely problems internal to the tests (TestReplication, TestMasterFailover). There is a second set of tests that after applying a draft of HBASE-4326, seems to moves to a different set of tests. I'm pretty convinced there are some cross test problems with these. This was on an 0.90.4 based branch, and by now several more changes have gone in. I'm getting back to HBASE-4326 and will try to get more stats on this. Alternately, I exclude tests that I identify as flaky and exclude them from the test run and have a separate test run that only runs the flaky tests. The hooks for the excludes build is in the hbase pom but only works with maven surefire 2.6 or 2.10 when it comes out. (there is a bug in surefire). See this jira for more details. http://jira.codehaus.org/browse/SUREFIRE-766 Jon. On Sun, Sep 25, 2011 at 2:27 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > At Salesforce we call these "flappers" and they are considered almost worse > than failing tests, > as they add noise to a test run without adding confidence. > At test that fails once in - say - 10 runs is worthless. > > > > ________________________________ > From: Ted Yu <[EMAIL PROTECTED]> > To: dev@hbase.apache.org > Sent: Sunday, September 25, 2011 1:41 PM > Subject: Re: maintaining stable HBase build > > As of 1:38 PST Sunday, the three builds all passed. > > I think we have some tests that exhibit in-deterministic behavior. > > I suggest committers interleave patch submissions by 2 hour span so that we > can more easily identify patch(es) that break the build. > > Thanks > > On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > I wrote a short blog: > > http://zhihongyu.blogspot.com/2011/09/streamlining-patch-submission.html > > > > It is geared towards contributors. > > > > Cheers > > > > > > On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan 00902313 < > > [EMAIL PROTECTED]> wrote: > > > >> Hi > >> > >> Ted, I agree with you. Pasting the testcase results in JIRA is also > fine, > >> mainly when there are some testcase failures when we run locally but if > we > >> feel it is not due to the fix we have added we can mention that also. I > >> think rather than in a windows machine its better to run in linux box. > >> > >> +1 for your suggestion Ted. > >> > >> Can we add the feature like in HDFS when we submit patch automatically > the > >> Jenkin's run the testcases? > >> > >> Atleast till this is done I go with your suggestion. > >> > >> Regards > >> Ram > >> > >> ----- Original Message ----- > >> From: Ted Yu <[EMAIL PROTECTED]> > >> Date: Saturday, September 24, 2011 4:22 pm > >> Subject: maintaining stable HBase build > >> To: dev@hbase.apache.org > >> > >> > Hi, > >> > I want to bring the importance of maintaining stable HBase build to > >> > ourattention. > >> > A stable HBase build is important, not just for the next release > >> > but also > >> > for authors of the pending patches to verify the correctness of > >> > their work. > >> > > >> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds > >> > were all > >> > blue. Now they're all red. > >> > > >> > I don't mind fixing Jenkins build. But if we collectively adopt > >> > some good > >> > practice, it would be easier to achieve the goal of having stable > >> > builds. > >> > For contributors, I understand that it takes so much time to run > >> > whole test > >> > suite that he/she may not have the luxury of doing this - Apache > >> > Jenkinswouldn't do it when you press Submit Patch button. > >> > If this is the case (let's call it scenario A), please use Eclipse > >> > (or other > >> > tool) to identify tests that exercise the classes/methods in your // Jonathan Hsieh (shay) // Software Engineer, Cloudera // [EMAIL PROTECTED]
-
Re: maintaining stable HBase buildlars hofhansl 2011-09-26, 17:45
I was thinking more along the lines:
Either fix the test to not flap, or remove it. The first task would be to identify all tests that frequently show non-deterministic results. ________________________________ From: Ted Yu <[EMAIL PROTECTED]> To: dev@hbase.apache.org; lars hofhansl <[EMAIL PROTECTED]> Sent: Monday, September 26, 2011 2:08 AM Subject: Re: maintaining stable HBase build Below is a simple script to repeatedly run a unit test. I suggest using it or similar script on the new unit test(s) in future patches. #!/bin/bash # script to run test repeatedly # usage: ./runtest.sh <name of test> <number of repetitions> # for (( i = 1 ; i <= $2; i++ )) do nice -10 mvn test -Dtest=$1 if [ $? -ne 0 ]; then echo "$1 failed" exit 1 fi done Thanks On Sun, Sep 25, 2011 at 2:27 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: At Salesforce we call these "flappers" and they are considered almost worse than failing tests, >as they add noise to a test run without adding confidence. >At test that fails once in - say - 10 runs is worthless. > > > >________________________________ > >From: Ted Yu <[EMAIL PROTECTED]> > >To: dev@hbase.apache.org >Sent: Sunday, September 25, 2011 1:41 PM > >Subject: Re: maintaining stable HBase build > > >As of 1:38 PST Sunday, the three builds all passed. > >I think we have some tests that exhibit in-deterministic behavior. > >I suggest committers interleave patch submissions by 2 hour span so that we >can more easily identify patch(es) that break the build. > >Thanks > >On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > >> I wrote a short blog: >> http://zhihongyu.blogspot.com/2011/09/streamlining-patch-submission.html >> >> It is geared towards contributors. >> >> Cheers >> >> >> On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan 00902313 < >> [EMAIL PROTECTED]> wrote: >> >>> Hi >>> >>> Ted, I agree with you. Pasting the testcase results in JIRA is also fine, >>> mainly when there are some testcase failures when we run locally but if we >>> feel it is not due to the fix we have added we can mention that also. I >>> think rather than in a windows machine its better to run in linux box. >>> >>> +1 for your suggestion Ted. >>> >>> Can we add the feature like in HDFS when we submit patch automatically the >>> Jenkin's run the testcases? >>> >>> Atleast till this is done I go with your suggestion. >>> >>> Regards >>> Ram >>> >>> ----- Original Message ----- >>> From: Ted Yu <[EMAIL PROTECTED]> >>> Date: Saturday, September 24, 2011 4:22 pm >>> Subject: maintaining stable HBase build >>> To: dev@hbase.apache.org >>> >>> > Hi, >>> > I want to bring the importance of maintaining stable HBase build to >>> > ourattention. >>> > A stable HBase build is important, not just for the next release >>> > but also >>> > for authors of the pending patches to verify the correctness of >>> > their work. >>> > >>> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds >>> > were all >>> > blue. Now they're all red. >>> > >>> > I don't mind fixing Jenkins build. But if we collectively adopt >>> > some good >>> > practice, it would be easier to achieve the goal of having stable >>> > builds. >>> > For contributors, I understand that it takes so much time to run >>> > whole test >>> > suite that he/she may not have the luxury of doing this - Apache >>> > Jenkinswouldn't do it when you press Submit Patch button. >>> > If this is the case (let's call it scenario A), please use Eclipse >>> > (or other >>> > tool) to identify tests that exercise the classes/methods in your >>> > patch and >>> > run them. Also clearly state what tests you ran in the JIRA. >>> > >>> > If you have a Linux box where you can run whole test suite, it >>> > would be nice >>> > to utilize such resource and run whole suite. Then please state >>> > this fact on >>> > the JIRA as well. >>> > Considering Todd's suggestion of holding off commit for 24 hours
-
Re: maintaining stable HBase buildRamakrishna S Vasudevan 0... 2011-09-26, 18:31
Hi
Just wanted to share one thing that i learnt today in maven for running testcases. May be many will be knowing. We usually face problems like when we run testcases as a bunch few gets failed due to system problems or improper clean up of previous testcases. As Jon suggested we can seperate out flaky test cases from the correct ones. In maven we have a facility called profiles. We can add the testcases that we have seperated out seperately(may be in 2 to 3 batches) and add it to seperate profiles. We can invoke these profiles like mvn test -P "profileid". We can right a script that executes every profile and inbetween executing every profile we can kill the java processes that are hanging if any testcases hangs. Just a suggestion. If you feel it suits you for some needs in any of your project work you can use it. Regards Ram ----- Original Message ----- From: Jonathan Hsieh <[EMAIL PROTECTED]> Date: Monday, September 26, 2011 11:15 pm Subject: Re: maintaining stable HBase build To: dev@hbase.apache.org, lars hofhansl <[EMAIL PROTECTED]> > I've been hunting some flaky tests down as well -- a few weeks back > I was > testing some changes along the line of HBASE-4326. (maybe some of > these are > fixed?) > > First, two test seemed to flake fairly frequently and were likely > problemsinternal to the tests (TestReplication, TestMasterFailover). > > There is a second set of tests that after applying a draft of HBASE- > 4326,seems to moves to a different set of tests. I'm pretty > convinced there are > some cross test problems with these. This was on an 0.90.4 based > branch, and > by now several more changes have gone in. I'm getting back to > HBASE-4326 > and will try to get more stats on this. > > Alternately, I exclude tests that I identify as flaky and exclude > them from > the test run and have a separate test run that only runs the flaky > tests. The hooks for the excludes build is in the hbase pom but > only works > with maven surefire 2.6 or 2.10 when it comes out. (there is a bug in > surefire). See this jira for more details. > http://jira.codehaus.org/browse/SUREFIRE-766 > > Jon. > > On Sun, Sep 25, 2011 at 2:27 PM, lars hofhansl > <[EMAIL PROTECTED]> wrote: > > > At Salesforce we call these "flappers" and they are considered > almost worse > > than failing tests, > > as they add noise to a test run without adding confidence. > > At test that fails once in - say - 10 runs is worthless. > > > > > > > > ________________________________ > > From: Ted Yu <[EMAIL PROTECTED]> > > To: dev@hbase.apache.org > > Sent: Sunday, September 25, 2011 1:41 PM > > Subject: Re: maintaining stable HBase build > > > > As of 1:38 PST Sunday, the three builds all passed. > > > > I think we have some tests that exhibit in-deterministic behavior. > > > > I suggest committers interleave patch submissions by 2 hour span > so that we > > can more easily identify patch(es) that break the build. > > > > Thanks > > > > On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > I wrote a short blog: > > > http://zhihongyu.blogspot.com/2011/09/streamlining-patch- > submission.html> > > > > It is geared towards contributors. > > > > > > Cheers > > > > > > > > > On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan > 00902313 < > > > [EMAIL PROTECTED]> wrote: > > > > > >> Hi > > >> > > >> Ted, I agree with you. Pasting the testcase results in JIRA > is also > > fine, > > >> mainly when there are some testcase failures when we run > locally but if > > we > > >> feel it is not due to the fix we have added we can mention > that also. I > > >> think rather than in a windows machine its better to run in > linux box. > > >> > > >> +1 for your suggestion Ted. > > >> > > >> Can we add the feature like in HDFS when we submit patch > automatically> the > > >> Jenkin's run the testcases? > > >> > > >> Atleast till this is done I go with your suggestion.
-
Re: maintaining stable HBase buildTed Yu 2011-09-26, 18:35
>> we can kill the java processes that are hanging if any testcases hangs.
I think it is very important to find out why certain tests hang. Obtaining jstack is the first step in terms of investigation. Regards On Mon, Sep 26, 2011 at 11:31 AM, Ramakrishna S Vasudevan 00902313 < [EMAIL PROTECTED]> wrote: > Hi > > Just wanted to share one thing that i learnt today in maven for running > testcases. > > May be many will be knowing. > > We usually face problems like when we run testcases as a bunch few gets > failed due to system problems or improper clean up of previous testcases. > > As Jon suggested we can seperate out flaky test cases from the correct > ones. > > In maven we have a facility called profiles. > We can add the testcases that we have seperated out seperately(may be in 2 > to 3 batches) and add it to seperate profiles. > > We can invoke these profiles like mvn test -P "profileid". > > We can right a script that executes every profile and inbetween executing > every profile we can kill the java processes that are hanging if any > testcases hangs. > Just a suggestion. If you feel it suits you for some needs in any of your > project work you can use it. > > Regards > Ram > > > > ----- Original Message ----- > From: Jonathan Hsieh <[EMAIL PROTECTED]> > Date: Monday, September 26, 2011 11:15 pm > Subject: Re: maintaining stable HBase build > To: dev@hbase.apache.org, lars hofhansl <[EMAIL PROTECTED]> > > > I've been hunting some flaky tests down as well -- a few weeks back > > I was > > testing some changes along the line of HBASE-4326. (maybe some of > > these are > > fixed?) > > > > First, two test seemed to flake fairly frequently and were likely > > problemsinternal to the tests (TestReplication, TestMasterFailover). > > > > There is a second set of tests that after applying a draft of HBASE- > > 4326,seems to moves to a different set of tests. I'm pretty > > convinced there are > > some cross test problems with these. This was on an 0.90.4 based > > branch, and > > by now several more changes have gone in. I'm getting back to > > HBASE-4326 > > and will try to get more stats on this. > > > > Alternately, I exclude tests that I identify as flaky and exclude > > them from > > the test run and have a separate test run that only runs the flaky > > tests. The hooks for the excludes build is in the hbase pom but > > only works > > with maven surefire 2.6 or 2.10 when it comes out. (there is a bug in > > surefire). See this jira for more details. > > http://jira.codehaus.org/browse/SUREFIRE-766 > > > > Jon. > > > > On Sun, Sep 25, 2011 at 2:27 PM, lars hofhansl > > <[EMAIL PROTECTED]> wrote: > > > > > At Salesforce we call these "flappers" and they are considered > > almost worse > > > than failing tests, > > > as they add noise to a test run without adding confidence. > > > At test that fails once in - say - 10 runs is worthless. > > > > > > > > > > > > ________________________________ > > > From: Ted Yu <[EMAIL PROTECTED]> > > > To: dev@hbase.apache.org > > > Sent: Sunday, September 25, 2011 1:41 PM > > > Subject: Re: maintaining stable HBase build > > > > > > As of 1:38 PST Sunday, the three builds all passed. > > > > > > I think we have some tests that exhibit in-deterministic behavior. > > > > > > I suggest committers interleave patch submissions by 2 hour span > > so that we > > > can more easily identify patch(es) that break the build. > > > > > > Thanks > > > > > > On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > > > I wrote a short blog: > > > > http://zhihongyu.blogspot.com/2011/09/streamlining-patch- > > submission.html> > > > > > It is geared towards contributors. > > > > > > > > Cheers > > > > > > > > > > > > On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan > > 00902313 < > > > > [EMAIL PROTECTED]> wrote: > > > > > > > >> Hi > > > >> > > > >> Ted, I agree with you. Pasting the testcase results in JIRA
-
Re: maintaining stable HBase buildlars hofhansl 2011-09-26, 18:37
Or if you wanted to the run a test with a known problem until it fails:
while mvn test -Dtest=<test>; do echo "Succeeded, running again"; done I sometimes run this at night for tests with rare race conditions. ________________________________ From: Ted Yu <[EMAIL PROTECTED]> To: dev@hbase.apache.org; lars hofhansl <[EMAIL PROTECTED]> Sent: Monday, September 26, 2011 2:08 AM Subject: Re: maintaining stable HBase build Below is a simple script to repeatedly run a unit test. I suggest using it or similar script on the new unit test(s) in future patches. #!/bin/bash # script to run test repeatedly # usage: ./runtest.sh <name of test> <number of repetitions> # for (( i = 1 ; i <= $2; i++ )) do nice -10 mvn test -Dtest=$1 if [ $? -ne 0 ]; then echo "$1 failed" exit 1 fi done Thanks On Sun, Sep 25, 2011 at 2:27 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > At Salesforce we call these "flappers" and they are considered almost worse > than failing tests, > as they add noise to a test run without adding confidence. > At test that fails once in - say - 10 runs is worthless. > > > > ________________________________ > From: Ted Yu <[EMAIL PROTECTED]> > To: dev@hbase.apache.org > Sent: Sunday, September 25, 2011 1:41 PM > Subject: Re: maintaining stable HBase build > > As of 1:38 PST Sunday, the three builds all passed. > > I think we have some tests that exhibit in-deterministic behavior. > > I suggest committers interleave patch submissions by 2 hour span so that we > can more easily identify patch(es) that break the build. > > Thanks > > On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > I wrote a short blog: > > http://zhihongyu.blogspot.com/2011/09/streamlining-patch-submission.html > > > > It is geared towards contributors. > > > > Cheers > > > > > > On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan 00902313 < > > [EMAIL PROTECTED]> wrote: > > > >> Hi > >> > >> Ted, I agree with you. Pasting the testcase results in JIRA is also > fine, > >> mainly when there are some testcase failures when we run locally but if > we > >> feel it is not due to the fix we have added we can mention that also. I > >> think rather than in a windows machine its better to run in linux box. > >> > >> +1 for your suggestion Ted. > >> > >> Can we add the feature like in HDFS when we submit patch automatically > the > >> Jenkin's run the testcases? > >> > >> Atleast till this is done I go with your suggestion. > >> > >> Regards > >> Ram > >> > >> ----- Original Message ----- > >> From: Ted Yu <[EMAIL PROTECTED]> > >> Date: Saturday, September 24, 2011 4:22 pm > >> Subject: maintaining stable HBase build > >> To: dev@hbase.apache.org > >> > >> > Hi, > >> > I want to bring the importance of maintaining stable HBase build to > >> > ourattention. > >> > A stable HBase build is important, not just for the next release > >> > but also > >> > for authors of the pending patches to verify the correctness of > >> > their work. > >> > > >> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds > >> > were all > >> > blue. Now they're all red. > >> > > >> > I don't mind fixing Jenkins build. But if we collectively adopt > >> > some good > >> > practice, it would be easier to achieve the goal of having stable > >> > builds. > >> > For contributors, I understand that it takes so much time to run > >> > whole test > >> > suite that he/she may not have the luxury of doing this - Apache > >> > Jenkinswouldn't do it when you press Submit Patch button. > >> > If this is the case (let's call it scenario A), please use Eclipse > >> > (or other > >> > tool) to identify tests that exercise the classes/methods in your > >> > patch and > >> > run them. Also clearly state what tests you ran in the JIRA. > >> > > >> > If you have a Linux box where you can run whole test suite, it > >> > would be nice > >> > to utilize such resource and run whole suite. Then please state
-
Re: maintaining stable HBase buildTed Yu 2011-09-26, 19:57
>From TRUNK build 2259:
Failed tests: queueFailover(org.apache. hadoop.hbase.replication.TestReplication): Waited too much time for queueFailover replication I know Doug's change wouldn't have caused the above failure. FYI On Mon, Sep 26, 2011 at 10:45 AM, lars hofhansl <[EMAIL PROTECTED]> wrote: > I was thinking more along the lines: > Either fix the test to not flap, or remove it. > > The first task would be to identify all tests that frequently show > non-deterministic results. > > ------------------------------ > *From:* Ted Yu <[EMAIL PROTECTED]> > *To:* dev@hbase.apache.org; lars hofhansl <[EMAIL PROTECTED]> > *Sent:* Monday, September 26, 2011 2:08 AM > > *Subject:* Re: maintaining stable HBase build > > Below is a simple script to repeatedly run a unit test. > I suggest using it or similar script on the new unit test(s) in future > patches. > > #!/bin/bash > # script to run test repeatedly > # usage: ./runtest.sh <name of test> <number of repetitions> > # > for (( i = 1 ; i <= $2; i++ )) > do > nice -10 mvn test -Dtest=$1 > if [ $? -ne 0 ]; then > echo "$1 failed" > exit 1 > fi > done > > Thanks > > On Sun, Sep 25, 2011 at 2:27 PM, lars hofhansl <[EMAIL PROTECTED]>wrote: > > At Salesforce we call these "flappers" and they are considered almost worse > than failing tests, > as they add noise to a test run without adding confidence. > At test that fails once in - say - 10 runs is worthless. > > > > ________________________________ > From: Ted Yu <[EMAIL PROTECTED]> > To: dev@hbase.apache.org > Sent: Sunday, September 25, 2011 1:41 PM > Subject: Re: maintaining stable HBase build > > As of 1:38 PST Sunday, the three builds all passed. > > I think we have some tests that exhibit in-deterministic behavior. > > I suggest committers interleave patch submissions by 2 hour span so that we > can more easily identify patch(es) that break the build. > > Thanks > > On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > I wrote a short blog: > > http://zhihongyu.blogspot.com/2011/09/streamlining-patch-submission.html > > > > It is geared towards contributors. > > > > Cheers > > > > > > On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan 00902313 < > > [EMAIL PROTECTED]> wrote: > > > >> Hi > >> > >> Ted, I agree with you. Pasting the testcase results in JIRA is also > fine, > >> mainly when there are some testcase failures when we run locally but if > we > >> feel it is not due to the fix we have added we can mention that also. I > >> think rather than in a windows machine its better to run in linux box. > >> > >> +1 for your suggestion Ted. > >> > >> Can we add the feature like in HDFS when we submit patch automatically > the > >> Jenkin's run the testcases? > >> > >> Atleast till this is done I go with your suggestion. > >> > >> Regards > >> Ram > >> > >> ----- Original Message ----- > >> From: Ted Yu <[EMAIL PROTECTED]> > >> Date: Saturday, September 24, 2011 4:22 pm > >> Subject: maintaining stable HBase build > >> To: dev@hbase.apache.org > >> > >> > Hi, > >> > I want to bring the importance of maintaining stable HBase build to > >> > ourattention. > >> > A stable HBase build is important, not just for the next release > >> > but also > >> > for authors of the pending patches to verify the correctness of > >> > their work. > >> > > >> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds > >> > were all > >> > blue. Now they're all red. > >> > > >> > I don't mind fixing Jenkins build. But if we collectively adopt > >> > some good > >> > practice, it would be easier to achieve the goal of having stable > >> > builds. > >> > For contributors, I understand that it takes so much time to run > >> > whole test > >> > suite that he/she may not have the luxury of doing this - Apache > >> > Jenkinswouldn't do it when you press Submit Patch button. > >> > If this is the case (let's call it scenario A), please use Eclipse
-
Re: maintaining stable HBase buildAndrew Purtell 2011-09-26, 22:19
A slow host (or busy running tests from other projects concurrently...) can cause failures in the replication tests.
- Andy ----- Original Message ----- > From: Ted Yu <[EMAIL PROTECTED]> > To: lars hofhansl <[EMAIL PROTECTED]> > Cc: "dev@hbase.apache.org" <dev@hbase.apache.org> > Sent: Monday, September 26, 2011 12:57 PM > Subject: Re: maintaining stable HBase build > > From TRUNK build 2259: > > Failed tests: queueFailover(org.apache. > hadoop.hbase.replication.TestReplication): Waited too much time for > queueFailover replication > > I know Doug's change wouldn't have caused the above failure. > > FYI
-
RE: maintaining stable HBase buildRamkrishna S Vasudevan 2011-09-27, 04:02
Hi Ted
Yes we need to investigate hanging tests seperately. Regards Ram -----Original Message----- From: Ted Yu [mailto:[EMAIL PROTECTED]] Sent: Tuesday, September 27, 2011 12:05 AM To: dev@hbase.apache.org Subject: Re: maintaining stable HBase build >> we can kill the java processes that are hanging if any testcases hangs. I think it is very important to find out why certain tests hang. Obtaining jstack is the first step in terms of investigation. Regards On Mon, Sep 26, 2011 at 11:31 AM, Ramakrishna S Vasudevan 00902313 < [EMAIL PROTECTED]> wrote: > Hi > > Just wanted to share one thing that i learnt today in maven for running > testcases. > > May be many will be knowing. > > We usually face problems like when we run testcases as a bunch few gets > failed due to system problems or improper clean up of previous testcases. > > As Jon suggested we can seperate out flaky test cases from the correct > ones. > > In maven we have a facility called profiles. > We can add the testcases that we have seperated out seperately(may be in 2 > to 3 batches) and add it to seperate profiles. > > We can invoke these profiles like mvn test -P "profileid". > > We can right a script that executes every profile and inbetween executing > every profile we can kill the java processes that are hanging if any > testcases hangs. > Just a suggestion. If you feel it suits you for some needs in any of your > project work you can use it. > > Regards > Ram > > > > ----- Original Message ----- > From: Jonathan Hsieh <[EMAIL PROTECTED]> > Date: Monday, September 26, 2011 11:15 pm > Subject: Re: maintaining stable HBase build > To: dev@hbase.apache.org, lars hofhansl <[EMAIL PROTECTED]> > > > I've been hunting some flaky tests down as well -- a few weeks back > > I was > > testing some changes along the line of HBASE-4326. (maybe some of > > these are > > fixed?) > > > > First, two test seemed to flake fairly frequently and were likely > > problemsinternal to the tests (TestReplication, TestMasterFailover). > > > > There is a second set of tests that after applying a draft of HBASE- > > 4326,seems to moves to a different set of tests. I'm pretty > > convinced there are > > some cross test problems with these. This was on an 0.90.4 based > > branch, and > > by now several more changes have gone in. I'm getting back to > > HBASE-4326 > > and will try to get more stats on this. > > > > Alternately, I exclude tests that I identify as flaky and exclude > > them from > > the test run and have a separate test run that only runs the flaky > > tests. The hooks for the excludes build is in the hbase pom but > > only works > > with maven surefire 2.6 or 2.10 when it comes out. (there is a bug in > > surefire). See this jira for more details. > > http://jira.codehaus.org/browse/SUREFIRE-766 > > > > Jon. > > > > On Sun, Sep 25, 2011 at 2:27 PM, lars hofhansl > > <[EMAIL PROTECTED]> wrote: > > > > > At Salesforce we call these "flappers" and they are considered > > almost worse > > > than failing tests, > > > as they add noise to a test run without adding confidence. > > > At test that fails once in - say - 10 runs is worthless. > > > > > > > > > > > > ________________________________ > > > From: Ted Yu <[EMAIL PROTECTED]> > > > To: dev@hbase.apache.org > > > Sent: Sunday, September 25, 2011 1:41 PM > > > Subject: Re: maintaining stable HBase build > > > > > > As of 1:38 PST Sunday, the three builds all passed. > > > > > > I think we have some tests that exhibit in-deterministic behavior. > > > > > > I suggest committers interleave patch submissions by 2 hour span > > so that we > > > can more easily identify patch(es) that break the build. > > > > > > Thanks > > > > > > On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > > > I wrote a short blog: > > > > http://zhihongyu.blogspot.com/2011/09/streamlining-patch- > > submission.html> > org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTableTests>
-
Re: maintaining stable HBase buildJonathan Hsieh 2011-09-27, 17:15
These are the flaky tests I've seen failing regularly from some time before
the around the 0.90.4 time to now. (we have a few backports that probably shouldn't affect these tests) : TestReplication.queueFailover TestMasterFailover.testMasterFailoverWithMockedRIT TestMasterFailover.testSimpleMasterFailover TestRollingRestart.testBasicRollingRestart I also tried backporting HBASE-4453 but it the replication queueFailover test still fails occasionally. These are being run on machines with several builds of other projects running on them. Jon On Mon, Sep 26, 2011 at 3:19 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > A slow host (or busy running tests from other projects concurrently...) can > cause failures in the replication tests. > > - Andy > > > ----- Original Message ----- > > From: Ted Yu <[EMAIL PROTECTED]> > > To: lars hofhansl <[EMAIL PROTECTED]> > > Cc: "dev@hbase.apache.org" <dev@hbase.apache.org> > > Sent: Monday, September 26, 2011 12:57 PM > > Subject: Re: maintaining stable HBase build > > > > From TRUNK build 2259: > > > > Failed tests: queueFailover(org.apache. > > hadoop.hbase.replication.TestReplication): Waited too much time for > > queueFailover replication > > > > I know Doug's change wouldn't have caused the above failure. > > > > FYI > -- // Jonathan Hsieh (shay) // Software Engineer, Cloudera // [EMAIL PROTECTED] |