|
lars hofhansl
2012-10-07, 06:11
lars hofhansl
2012-10-07, 06:41
Andrew Purtell
2012-10-07, 08:36
lars hofhansl
2012-10-07, 16:21
lars hofhansl
2012-10-07, 16:26
lars hofhansl
2012-10-07, 16:44
lars hofhansl
2012-10-07, 18:23
lars hofhansl
2012-10-07, 20:36
Ramkrishna.S.Vasudevan
2012-10-08, 04:01
lars hofhansl
2012-10-08, 04:15
Andrew Purtell
2012-10-07, 08:46
Andrew Purtell
2012-10-07, 08:48
|
-
State of the 0.94 testslars hofhansl 2012-10-07, 06:11
I've been trying (essentially the entire day) getting a successful jenkins build for 0.94 (triggering the test run periodically from my phone). Not a *single* run succeeded.
This is clearly not acceptable. Something is off. The tests that fails the most frequently are: - TestSplitTransactionOnCluster.testShouldThrowIOExceptionIfStoreFileSizeIsEmptyAndSHouldSuccessfullyExecuteRollback - TestSplitTransactionOnCluster.testShouldClearRITWhenNodeFoundInSplittingState (The failure cause most of the time is too many files open, but also fail because of unavailable regions). Both tests were added recently (since 0.94.2RC2). See HBASE-6854 and HBASE-6853. Either there is something wrong with the tests, or we introduced some problems in the code base. Note that I am not dinging these two changes specifically. Both were fixes with a lot of thought and care behind them. There are also various time out issues in other tests. These were all the fixes added since the last RC: [HBASE-4565] - Maven HBase build broken on cygwin with copynativelib.sh call [HBASE-6299] - RS starting region open while failing ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems [HBASE-6679] - RegionServer aborts due to race between compaction and split [HBASE-6688] - folder referred by thrift demo app instructions is outdated [HBASE-6854] - Deletion of SPLITTING node on split rollback should clear the region from RIT [HBASE-6871] - HFileBlockIndex Write Error in HFile V2 due to incorrect split into intermediate index blocks [HBASE-6888] - HBase scripts ignore any HBASE_OPTS set in the environment [HBASE-6889] - Ignore source control files with apache-rat [HBASE-6900] - RegionScanner.reseek() creates NPE when a flush or compaction happens before the reseek. [HBASE-6901] - Store file compactSelection throws ArrayIndexOutOfBoundsException [HBASE-6906] - TestHBaseFsck#testQuarantine* tests are flakey due to TableNotEnabledException [HBASE-6912] - Filters are not properly applied in certain cases [HBASE-6916] - HBA logs at info level errors that won't show in the shell [HBASE-6920] - On timeout connecting to master, client can get stuck and never make progress [HBASE-6927] - WrongFS using HRegionInfo.getTableDesc() and different fs for hbase.root and fs.defaultFS [HBASE-6946] - JavaDoc missing from release tarballs [HBASE-5582] - "No HServerInfo found for" should be a WARNING message [HBASE-6914] - Scans/Gets/Mutations don't give a good error if the table is disabled. [HBASE-6853] - IllegalArgument Exception is thrown when an empty region is spliitted. Unless somebody (Ram :) ) speaks up I will roll back HBASE-6854 and HBASE-6853 (and maybe HBASE-6299) I could also roll all of these back except HBASE-6920 (which is the one that sunk the last RC). And leave the rest of the next RC. Also, from now on - at least until 0.94.2 is released, please clear all 0.94 changes with me before you commit. There is clearly too much churn going into 0.94 too quickly, which prevents 0.94.2 from stabilizing. -- Lars +
lars hofhansl 2012-10-07, 06:11
-
Re: State of the 0.94 testslars hofhansl 2012-10-07, 06:41
Looks like after all that whining I finally got a successful build.
But I lost confidence in the current 0.94 code line. Still, it is possible that all of these were environmental issue. If we can get a few more successful runs, it could be OK. -- Lars ________________________________ From: lars hofhansl <[EMAIL PROTECTED]> To: hbase-dev <[EMAIL PROTECTED]> Sent: Saturday, October 6, 2012 11:11 PM Subject: State of the 0.94 tests I've been trying (essentially the entire day) getting a successful jenkins build for 0.94 (triggering the test run periodically from my phone). Not a *single* run succeeded. This is clearly not acceptable. Something is off. The tests that fails the most frequently are: - TestSplitTransactionOnCluster.testShouldThrowIOExceptionIfStoreFileSizeIsEmptyAndSHouldSuccessfullyExecuteRollback - TestSplitTransactionOnCluster.testShouldClearRITWhenNodeFoundInSplittingState (The failure cause most of the time is too many files open, but also fail because of unavailable regions). Both tests were added recently (since 0.94.2RC2). See HBASE-6854 and HBASE-6853. Either there is something wrong with the tests, or we introduced some problems in the code base. Note that I am not dinging these two changes specifically. Both were fixes with a lot of thought and care behind them. There are also various time out issues in other tests. These were all the fixes added since the last RC: [HBASE-4565] - Maven HBase build broken on cygwin with copynativelib.sh call [HBASE-6299] - RS starting region open while failing ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems [HBASE-6679] - RegionServer aborts due to race between compaction and split [HBASE-6688] - folder referred by thrift demo app instructions is outdated [HBASE-6854] - Deletion of SPLITTING node on split rollback should clear the region from RIT [HBASE-6871] - HFileBlockIndex Write Error in HFile V2 due to incorrect split into intermediate index blocks [HBASE-6888] - HBase scripts ignore any HBASE_OPTS set in the environment [HBASE-6889] - Ignore source control files with apache-rat [HBASE-6900] - RegionScanner.reseek() creates NPE when a flush or compaction happens before the reseek. [HBASE-6901] - Store file compactSelection throws ArrayIndexOutOfBoundsException [HBASE-6906] - TestHBaseFsck#testQuarantine* tests are flakey due to TableNotEnabledException [HBASE-6912] - Filters are not properly applied in certain cases [HBASE-6916] - HBA logs at info level errors that won't show in the shell [HBASE-6920] - On timeout connecting to master, client can get stuck and never make progress [HBASE-6927] - WrongFS using HRegionInfo.getTableDesc() and different fs for hbase.root and fs.defaultFS [HBASE-6946] - JavaDoc missing from release tarballs [HBASE-5582] - "No HServerInfo found for" should be a WARNING message [HBASE-6914] - Scans/Gets/Mutations don't give a good error if the table is disabled. [HBASE-6853] - IllegalArgument Exception is thrown when an empty region is spliitted. Unless somebody (Ram :) ) speaks up I will roll back HBASE-6854 and HBASE-6853 (and maybe HBASE-6299) I could also roll all of these back except HBASE-6920 (which is the one that sunk the last RC). And leave the rest of the next RC. Also, from now on - at least until 0.94.2 is released, please clear all 0.94 changes with me before you commit. There is clearly too much churn going into 0.94 too quickly, which prevents 0.94.2 from stabilizing. -- Lars +
lars hofhansl 2012-10-07, 06:41
-
Re: State of the 0.94 testsAndrew Purtell 2012-10-07, 08:36
Too many open files usually is an environment issue.
Lars, you should consider setting up a private Jenkins as a sanity check. On Oct 7, 2012, at 2:41 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > Looks like after all that whining I finally got a successful build. > But I lost confidence in the current 0.94 code line. > > Still, it is possible that all of these were environmental issue. If we can get a few more successful runs, it could be OK. > > -- Lars > > > > ________________________________ > From: lars hofhansl <[EMAIL PROTECTED]> > To: hbase-dev <[EMAIL PROTECTED]> > Sent: Saturday, October 6, 2012 11:11 PM > Subject: State of the 0.94 tests > > I've been trying (essentially the entire day) getting a successful jenkins build for 0.94 (triggering the test run periodically from my phone). Not a *single* run succeeded. > This is clearly not acceptable. Something is off. > > The tests that fails the most frequently are: > - TestSplitTransactionOnCluster.testShouldThrowIOExceptionIfStoreFileSizeIsEmptyAndSHouldSuccessfullyExecuteRollback > - TestSplitTransactionOnCluster.testShouldClearRITWhenNodeFoundInSplittingState > (The failure cause most of the time is too many files open, but also fail because of unavailable regions). > > Both tests were added recently (since 0.94.2RC2). See HBASE-6854 and HBASE-6853. > > Either there is something wrong with the tests, or we introduced some problems in the code base. > > Note that I am not dinging these two changes specifically. Both were fixes with a lot of thought and care behind them. > > There are also various time out issues in other tests. > > These were all the fixes added since the last RC: > [HBASE-4565] - Maven HBase build broken on cygwin with copynativelib.sh call > [HBASE-6299] - RS starting region open while failing ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems > [HBASE-6679] - RegionServer aborts due to race between compaction and split > [HBASE-6688] - folder referred by thrift demo app instructions is outdated > [HBASE-6854] - Deletion of SPLITTING node on split rollback should clear the region from RIT > [HBASE-6871] - HFileBlockIndex Write Error in HFile V2 due to incorrect split into intermediate index blocks > [HBASE-6888] - HBase scripts ignore any HBASE_OPTS set in the environment > [HBASE-6889] - Ignore source control files with apache-rat > [HBASE-6900] - RegionScanner.reseek() creates NPE when a flush or compaction happens before the reseek. > [HBASE-6901] - Store file compactSelection throws ArrayIndexOutOfBoundsException > [HBASE-6906] - TestHBaseFsck#testQuarantine* tests are flakey due to TableNotEnabledException > [HBASE-6912] - Filters are not properly applied in certain cases > [HBASE-6916] - HBA logs at info level errors that won't show in the shell > [HBASE-6920] - On timeout connecting to master, client can get stuck and never make progress > [HBASE-6927] - WrongFS using HRegionInfo.getTableDesc() and different fs for hbase.root and fs.defaultFS > [HBASE-6946] - JavaDoc missing from release tarballs > [HBASE-5582] - "No HServerInfo found for" should be a WARNING message > [HBASE-6914] - Scans/Gets/Mutations don't give a good error if the table is disabled. > [HBASE-6853] - IllegalArgument Exception is thrown when an empty region is spliitted. > > Unless somebody (Ram :) ) speaks up I will roll back HBASE-6854 and HBASE-6853 (and maybe HBASE-6299) > > I could also roll all of these back except HBASE-6920 (which is the one that sunk the last RC). And leave the rest of the next RC. > > Also, from now on - at least until 0.94.2 is released, please clear all 0.94 changes with me before you commit. There is clearly too much churn going into 0.94 too quickly, which prevents 0.94.2 from stabilizing. > > -- Lars +
Andrew Purtell 2012-10-07, 08:36
-
Re: State of the 0.94 testslars hofhansl 2012-10-07, 16:21
Probably. It is just strange that these two newly added tests are failing the most frequently.
-- Lars ________________________________ From: Andrew Purtell <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Sunday, October 7, 2012 1:36 AM Subject: Re: State of the 0.94 tests Too many open files usually is an environment issue. Lars, you should consider setting up a private Jenkins as a sanity check. On Oct 7, 2012, at 2:41 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > Looks like after all that whining I finally got a successful build. > But I lost confidence in the current 0.94 code line. > > Still, it is possible that all of these were environmental issue. If we can get a few more successful runs, it could be OK. > > -- Lars > > > > ________________________________ > From: lars hofhansl <[EMAIL PROTECTED]> > To: hbase-dev <[EMAIL PROTECTED]> > Sent: Saturday, October 6, 2012 11:11 PM > Subject: State of the 0.94 tests > > I've been trying (essentially the entire day) getting a successful jenkins build for 0.94 (triggering the test run periodically from my phone). Not a *single* run succeeded. > This is clearly not acceptable. Something is off. > > The tests that fails the most frequently are: > - TestSplitTransactionOnCluster.testShouldThrowIOExceptionIfStoreFileSizeIsEmptyAndSHouldSuccessfullyExecuteRollback > - TestSplitTransactionOnCluster.testShouldClearRITWhenNodeFoundInSplittingState > (The failure cause most of the time is too many files open, but also fail because of unavailable regions). > > Both tests were added recently (since 0.94.2RC2). See HBASE-6854 and HBASE-6853. > > Either there is something wrong with the tests, or we introduced some problems in the code base. > > Note that I am not dinging these two changes specifically. Both were fixes with a lot of thought and care behind them. > > There are also various time out issues in other tests. > > These were all the fixes added since the last RC: > [HBASE-4565] - Maven HBase build broken on cygwin with copynativelib.sh call > [HBASE-6299] - RS starting region open while failing ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems > [HBASE-6679] - RegionServer aborts due to race between compaction and split > [HBASE-6688] - folder referred by thrift demo app instructions is outdated > [HBASE-6854] - Deletion of SPLITTING node on split rollback should clear the region from RIT > [HBASE-6871] - HFileBlockIndex Write Error in HFile V2 due to incorrect split into intermediate index blocks > [HBASE-6888] - HBase scripts ignore any HBASE_OPTS set in the environment > [HBASE-6889] - Ignore source control files with apache-rat > [HBASE-6900] - RegionScanner.reseek() creates NPE when a flush or compaction happens before the reseek. > [HBASE-6901] - Store file compactSelection throws ArrayIndexOutOfBoundsException > [HBASE-6906] - TestHBaseFsck#testQuarantine* tests are flakey due to TableNotEnabledException > [HBASE-6912] - Filters are not properly applied in certain cases > [HBASE-6916] - HBA logs at info level errors that won't show in the shell > [HBASE-6920] - On timeout connecting to master, client can get stuck and never make progress > [HBASE-6927] - WrongFS using HRegionInfo.getTableDesc() and different fs for hbase.root and fs.defaultFS > [HBASE-6946] - JavaDoc missing from release tarballs > [HBASE-5582] - "No HServerInfo found for" should be a WARNING message > [HBASE-6914] - Scans/Gets/Mutations don't give a good error if the table is disabled. > [HBASE-6853] - IllegalArgument Exception is thrown when an empty region is spliitted. > > Unless somebody (Ram :) ) speaks up I will roll back HBASE-6854 and HBASE-6853 (and maybe HBASE-6299) > > I could also roll all of these back except HBASE-6920 (which is the one that sunk the last RC). And leave the rest of the next RC. > > Also, from now on - at least until 0.94.2 is released, please clear all 0.94 changes with me before you commit. There is clearly too much churn going into 0.94 too quickly, which prevents 0.94.2 from stabilizing. +
lars hofhansl 2012-10-07, 16:21
-
Re: State of the 0.94 testslars hofhansl 2012-10-07, 16:26
Another part that is strange is that the security build fails far less frequently (disregard the many runs from yesterday, I needed to fix a problem introduced with HBASE-6920 that broke the security build/test).
The security tests also usually finish in about 1/2 of the time (~25m as opposed to ~50m). As far as I can see they execute the exact same tests, so that is interesting. -- Lars ________________________________ From: lars hofhansl <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Sunday, October 7, 2012 9:21 AM Subject: Re: State of the 0.94 tests Probably. It is just strange that these two newly added tests are failing the most frequently. -- Lars ________________________________ From: Andrew Purtell <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Sunday, October 7, 2012 1:36 AM Subject: Re: State of the 0.94 tests Too many open files usually is an environment issue. Lars, you should consider setting up a private Jenkins as a sanity check. On Oct 7, 2012, at 2:41 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > Looks like after all that whining I finally got a successful build. > But I lost confidence in the current 0.94 code line. > > Still, it is possible that all of these were environmental issue. If we can get a few more successful runs, it could be OK. > > -- Lars > > > > ________________________________ > From: lars hofhansl <[EMAIL PROTECTED]> > To: hbase-dev <[EMAIL PROTECTED]> > Sent: Saturday, October 6, 2012 11:11 PM > Subject: State of the 0.94 tests > > I've been trying (essentially the entire day) getting a successful jenkins build for 0.94 (triggering the test run periodically from my phone). Not a *single* run succeeded. > This is clearly not acceptable. Something is off. > > The tests that fails the most frequently are: > - TestSplitTransactionOnCluster.testShouldThrowIOExceptionIfStoreFileSizeIsEmptyAndSHouldSuccessfullyExecuteRollback > - TestSplitTransactionOnCluster.testShouldClearRITWhenNodeFoundInSplittingState > (The failure cause most of the time is too many files open, but also fail because of unavailable regions). > > Both tests were added recently (since 0.94.2RC2). See HBASE-6854 and HBASE-6853. > > Either there is something wrong with the tests, or we introduced some problems in the code base. > > Note that I am not dinging these two changes specifically. Both were fixes with a lot of thought and care behind them. > > There are also various time out issues in other tests. > > These were all the fixes added since the last RC: > [HBASE-4565] - Maven HBase build broken on cygwin with copynativelib.sh call > [HBASE-6299] - RS starting region open while failing ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems > [HBASE-6679] - RegionServer aborts due to race between compaction and split > [HBASE-6688] - folder referred by thrift demo app instructions is outdated > [HBASE-6854] - Deletion of SPLITTING node on split rollback should clear the region from RIT > [HBASE-6871] - HFileBlockIndex Write Error in HFile V2 due to incorrect split into intermediate index blocks > [HBASE-6888] - HBase scripts ignore any HBASE_OPTS set in the environment > [HBASE-6889] - Ignore source control files with apache-rat > [HBASE-6900] - RegionScanner.reseek() creates NPE when a flush or compaction happens before the reseek. > [HBASE-6901] - Store file compactSelection throws ArrayIndexOutOfBoundsException > [HBASE-6906] - TestHBaseFsck#testQuarantine* tests are flakey due to TableNotEnabledException > [HBASE-6912] - Filters are not properly applied in certain cases > [HBASE-6916] - HBA logs at info level errors that won't show in the shell > [HBASE-6920] - On timeout connecting to master, client can get stuck and never make progress > [HBASE-6927] - WrongFS using HRegionInfo.getTableDesc() and different fs for hbase.root and fs.defaultFS > [HBASE-6946] - JavaDoc missing from release tarballs +
lars hofhansl 2012-10-07, 16:26
-
Re: State of the 0.94 testslars hofhansl 2012-10-07, 16:44
Hmm... the 0.94 build is setup with -PrunAllTests, whereas the security build is not. So that explains it.
________________________________ From: lars hofhansl <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Sunday, October 7, 2012 9:26 AM Subject: Re: State of the 0.94 tests Another part that is strange is that the security build fails far less frequently (disregard the many runs from yesterday, I needed to fix a problem introduced with HBASE-6920 that broke the security build/test). The security tests also usually finish in about 1/2 of the time (~25m as opposed to ~50m). As far as I can see they execute the exact same tests, so that is interesting. -- Lars ________________________________ From: lars hofhansl <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Sunday, October 7, 2012 9:21 AM Subject: Re: State of the 0.94 tests Probably. It is just strange that these two newly added tests are failing the most frequently. -- Lars ________________________________ From: Andrew Purtell <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Sunday, October 7, 2012 1:36 AM Subject: Re: State of the 0.94 tests Too many open files usually is an environment issue. Lars, you should consider setting up a private Jenkins as a sanity check. On Oct 7, 2012, at 2:41 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > Looks like after all that whining I finally got a successful build. > But I lost confidence in the current 0.94 code line. > > Still, it is possible that all of these were environmental issue. If we can get a few more successful runs, it could be OK. > > -- Lars > > > > ________________________________ > From: lars hofhansl <[EMAIL PROTECTED]> > To: hbase-dev <[EMAIL PROTECTED]> > Sent: Saturday, October 6, 2012 11:11 PM > Subject: State of the 0.94 tests > > I've been trying (essentially the entire day) getting a successful jenkins build for 0.94 (triggering the test run periodically from my phone). Not a *single* run succeeded. > This is clearly not acceptable. Something is off. > > The tests that fails the most frequently are: > - TestSplitTransactionOnCluster.testShouldThrowIOExceptionIfStoreFileSizeIsEmptyAndSHouldSuccessfullyExecuteRollback > - TestSplitTransactionOnCluster.testShouldClearRITWhenNodeFoundInSplittingState > (The failure cause most of the time is too many files open, but also fail because of unavailable regions). > > Both tests were added recently (since 0.94.2RC2). See HBASE-6854 and HBASE-6853. > > Either there is something wrong with the tests, or we introduced some problems in the code base. > > Note that I am not dinging these two changes specifically. Both were fixes with a lot of thought and care behind them. > > There are also various time out issues in other tests. > > These were all the fixes added since the last RC: > [HBASE-4565] - Maven HBase build broken on cygwin with copynativelib.sh call > [HBASE-6299] - RS starting region open while failing ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems > [HBASE-6679] - RegionServer aborts due to race between compaction and split > [HBASE-6688] - folder referred by thrift demo app instructions is outdated > [HBASE-6854] - Deletion of SPLITTING node on split rollback should clear the region from RIT > [HBASE-6871] - HFileBlockIndex Write Error in HFile V2 due to incorrect split into intermediate index blocks > [HBASE-6888] - HBase scripts ignore any HBASE_OPTS set in the environment > [HBASE-6889] - Ignore source control files with apache-rat > [HBASE-6900] - RegionScanner.reseek() creates NPE when a flush or compaction happens before the reseek. > [HBASE-6901] - Store file compactSelection throws ArrayIndexOutOfBoundsException > [HBASE-6906] - TestHBaseFsck#testQuarantine* tests are flakey due to TableNotEnabledException > [HBASE-6912] - Filters are not properly applied in certain cases +
lars hofhansl 2012-10-07, 16:44
-
Re: State of the 0.94 testslars hofhansl 2012-10-07, 18:23
I looked back through the failures. I had recently enabled all "ubuntu" build vms for the 0.94 builds.
It turns out that most of the environment issues occur on ubuntu2. I excluded that from the build vms. -- Lars ________________________________ From: Andrew Purtell <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Sunday, October 7, 2012 1:36 AM Subject: Re: State of the 0.94 tests Too many open files usually is an environment issue. Lars, you should consider setting up a private Jenkins as a sanity check. On Oct 7, 2012, at 2:41 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > Looks like after all that whining I finally got a successful build. > But I lost confidence in the current 0.94 code line. > > Still, it is possible that all of these were environmental issue. If we can get a few more successful runs, it could be OK. > > -- Lars > > > > ________________________________ > From: lars hofhansl <[EMAIL PROTECTED]> > To: hbase-dev <[EMAIL PROTECTED]> > Sent: Saturday, October 6, 2012 11:11 PM > Subject: State of the 0.94 tests > > I've been trying (essentially the entire day) getting a successful jenkins build for 0.94 (triggering the test run periodically from my phone). Not a *single* run succeeded. > This is clearly not acceptable. Something is off. > > The tests that fails the most frequently are: > - TestSplitTransactionOnCluster.testShouldThrowIOExceptionIfStoreFileSizeIsEmptyAndSHouldSuccessfullyExecuteRollback > - TestSplitTransactionOnCluster.testShouldClearRITWhenNodeFoundInSplittingState > (The failure cause most of the time is too many files open, but also fail because of unavailable regions). > > Both tests were added recently (since 0.94.2RC2). See HBASE-6854 and HBASE-6853. > > Either there is something wrong with the tests, or we introduced some problems in the code base. > > Note that I am not dinging these two changes specifically. Both were fixes with a lot of thought and care behind them. > > There are also various time out issues in other tests. > > These were all the fixes added since the last RC: > [HBASE-4565] - Maven HBase build broken on cygwin with copynativelib.sh call > [HBASE-6299] - RS starting region open while failing ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems > [HBASE-6679] - RegionServer aborts due to race between compaction and split > [HBASE-6688] - folder referred by thrift demo app instructions is outdated > [HBASE-6854] - Deletion of SPLITTING node on split rollback should clear the region from RIT > [HBASE-6871] - HFileBlockIndex Write Error in HFile V2 due to incorrect split into intermediate index blocks > [HBASE-6888] - HBase scripts ignore any HBASE_OPTS set in the environment > [HBASE-6889] - Ignore source control files with apache-rat > [HBASE-6900] - RegionScanner.reseek() creates NPE when a flush or compaction happens before the reseek. > [HBASE-6901] - Store file compactSelection throws ArrayIndexOutOfBoundsException > [HBASE-6906] - TestHBaseFsck#testQuarantine* tests are flakey due to TableNotEnabledException > [HBASE-6912] - Filters are not properly applied in certain cases > [HBASE-6916] - HBA logs at info level errors that won't show in the shell > [HBASE-6920] - On timeout connecting to master, client can get stuck and never make progress > [HBASE-6927] - WrongFS using HRegionInfo.getTableDesc() and different fs for hbase.root and fs.defaultFS > [HBASE-6946] - JavaDoc missing from release tarballs > [HBASE-5582] - "No HServerInfo found for" should be a WARNING message > [HBASE-6914] - Scans/Gets/Mutations don't give a good error if the table is disabled. > [HBASE-6853] - IllegalArgument Exception is thrown when an empty region is spliitted. > > Unless somebody (Ram :) ) speaks up I will roll back HBASE-6854 and HBASE-6853 (and maybe HBASE-6299) > > I could also roll all of these back except HBASE-6920 (which is the one that sunk the last RC). And leave the rest of the next RC. +
lars hofhansl 2012-10-07, 18:23
-
Re: State of the 0.94 testslars hofhansl 2012-10-07, 20:36
After this change things look better. Apologies for the noise. Stay tuned for the next RC.
-- Lars ________________________________ From: lars hofhansl <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Sunday, October 7, 2012 11:23 AM Subject: Re: State of the 0.94 tests I looked back through the failures. I had recently enabled all "ubuntu" build vms for the 0.94 builds. It turns out that most of the environment issues occur on ubuntu2. I excluded that from the build vms. -- Lars ________________________________ From: Andrew Purtell <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Sunday, October 7, 2012 1:36 AM Subject: Re: State of the 0.94 tests Too many open files usually is an environment issue. Lars, you should consider setting up a private Jenkins as a sanity check. On Oct 7, 2012, at 2:41 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > Looks like after all that whining I finally got a successful build. > But I lost confidence in the current 0.94 code line. > > Still, it is possible that all of these were environmental issue. If we can get a few more successful runs, it could be OK. > > -- Lars > > > > ________________________________ > From: lars hofhansl <[EMAIL PROTECTED]> > To: hbase-dev <[EMAIL PROTECTED]> > Sent: Saturday, October 6, 2012 11:11 PM > Subject: State of the 0.94 tests > > I've been trying (essentially the entire day) getting a successful jenkins build for 0.94 (triggering the test run periodically from my phone). Not a *single* run succeeded. > This is clearly not acceptable. Something is off. > > The tests that fails the most frequently are: > - TestSplitTransactionOnCluster.testShouldThrowIOExceptionIfStoreFileSizeIsEmptyAndSHouldSuccessfullyExecuteRollback > - TestSplitTransactionOnCluster.testShouldClearRITWhenNodeFoundInSplittingState > (The failure cause most of the time is too many files open, but also fail because of unavailable regions). > > Both tests were added recently (since 0.94.2RC2). See HBASE-6854 and HBASE-6853. > > Either there is something wrong with the tests, or we introduced some problems in the code base. > > Note that I am not dinging these two changes specifically. Both were fixes with a lot of thought and care behind them. > > There are also various time out issues in other tests. > > These were all the fixes added since the last RC: > [HBASE-4565] - Maven HBase build broken on cygwin with copynativelib.sh call > [HBASE-6299] - RS starting region open while failing ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems > [HBASE-6679] - RegionServer aborts due to race between compaction and split > [HBASE-6688] - folder referred by thrift demo app instructions is outdated > [HBASE-6854] - Deletion of SPLITTING node on split rollback should clear the region from RIT > [HBASE-6871] - HFileBlockIndex Write Error in HFile V2 due to incorrect split into intermediate index blocks > [HBASE-6888] - HBase scripts ignore any HBASE_OPTS set in the environment > [HBASE-6889] - Ignore source control files with apache-rat > [HBASE-6900] - RegionScanner.reseek() creates NPE when a flush or compaction happens before the reseek. > [HBASE-6901] - Store file compactSelection throws ArrayIndexOutOfBoundsException > [HBASE-6906] - TestHBaseFsck#testQuarantine* tests are flakey due to TableNotEnabledException > [HBASE-6912] - Filters are not properly applied in certain cases > [HBASE-6916] - HBA logs at info level errors that won't show in the shell > [HBASE-6920] - On timeout connecting to master, client can get stuck and never make progress > [HBASE-6927] - WrongFS using HRegionInfo.getTableDesc() and different fs for hbase.root and fs.defaultFS > [HBASE-6946] - JavaDoc missing from release tarballs > [HBASE-5582] - "No HServerInfo found for" should be a WARNING message > [HBASE-6914] - Scans/Gets/Mutations don't give a good error if the table is disabled. +
lars hofhansl 2012-10-07, 20:36
-
RE: State of the 0.94 testsRamkrishna.S.Vasudevan 2012-10-08, 04:01
Hi Lars
I was not in town and was in travel for the last 2 days. I will immediately check the reason for the testcase failures. Had I been there I would have helped out earlier. Sorry about that. Regards Ram > -----Original Message----- > From: lars hofhansl [mailto:[EMAIL PROTECTED]] > Sent: Monday, October 08, 2012 2:07 AM > To: [EMAIL PROTECTED] > Subject: Re: State of the 0.94 tests > > After this change things look better. Apologies for the noise. Stay > tuned for the next RC. > > -- Lars > > > > ________________________________ > From: lars hofhansl <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Sent: Sunday, October 7, 2012 11:23 AM > Subject: Re: State of the 0.94 tests > > I looked back through the failures. I had recently enabled all "ubuntu" > build vms for the 0.94 builds. > It turns out that most of the environment issues occur on ubuntu2. I > excluded that from the build vms. > > > -- Lars > > > > ________________________________ > From: Andrew Purtell <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Sent: Sunday, October 7, 2012 1:36 AM > Subject: Re: State of the 0.94 tests > > Too many open files usually is an environment issue. > > Lars, you should consider setting up a private Jenkins as a sanity > check. > > On Oct 7, 2012, at 2:41 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > > > Looks like after all that whining I finally got a successful build. > > But I lost confidence in the current 0.94 code line. > > > > Still, it is possible that all of these were environmental issue. If > we can get a few more successful runs, it could be OK. > > > > -- Lars > > > > > > > > ________________________________ > > From: lars hofhansl <[EMAIL PROTECTED]> > > To: hbase-dev <[EMAIL PROTECTED]> > > Sent: Saturday, October 6, 2012 11:11 PM > > Subject: State of the 0.94 tests > > > > I've been trying (essentially the entire day) getting a successful > jenkins build for 0.94 (triggering the test run periodically from my > phone). Not a *single* run succeeded. > > This is clearly not acceptable. Something is off. > > > > The tests that fails the most frequently are: > > - > TestSplitTransactionOnCluster.testShouldThrowIOExceptionIfStoreFileSize > IsEmptyAndSHouldSuccessfullyExecuteRollback > > - > TestSplitTransactionOnCluster.testShouldClearRITWhenNodeFoundInSplittin > gState > > (The failure cause most of the time is too many files open, but also > fail because of unavailable regions). > > > > Both tests were added recently (since 0.94.2RC2). See HBASE-6854 and > HBASE-6853. > > > > Either there is something wrong with the tests, or we introduced some > problems in the code base. > > > > Note that I am not dinging these two changes specifically. Both were > fixes with a lot of thought and care behind them. > > > > There are also various time out issues in other tests. > > > > These were all the fixes added since the last RC: > > [HBASE-4565] - Maven HBase build broken on cygwin with > copynativelib.sh call > > [HBASE-6299] - RS starting region open while failing ack to > HMaster.sendRegionOpen() causes inconsistency in HMaster's region state > and a series of successive problems > > [HBASE-6679] - RegionServer aborts due to race between compaction and > split > > [HBASE-6688] - folder referred by thrift demo app instructions is > outdated > > [HBASE-6854] - Deletion of SPLITTING node on split rollback should > clear the region from RIT > > [HBASE-6871] - HFileBlockIndex Write Error in HFile V2 due to > incorrect split into intermediate index blocks > > [HBASE-6888] - HBase scripts ignore any HBASE_OPTS set in the > environment > > [HBASE-6889] - Ignore source control files with apache-rat > > [HBASE-6900] - RegionScanner.reseek() creates NPE when a flush or > compaction happens before the reseek. > > [HBASE-6901] - Store file compactSelection throws > ArrayIndexOutOfBoundsException > > [HBASE-6906] - TestHBaseFsck#testQuarantine* tests are flakey due to +
Ramkrishna.S.Vasudevan 2012-10-08, 04:01
-
Re: State of the 0.94 testslars hofhansl 2012-10-08, 04:15
Thanks Ram,
as I said, these are well thought out fixes. Sometimes I think you are the only one who actually understands how the assignment/balance process really works :) With the environment issues out of the way (by not using ubuntu2) the test fails much less frequently. The frequently failures were actually my fault by enabling ubuntu2 for the jenkins build in the first place (there was a reason why someone had disabled it before). testShouldThrowIOExceptionIfStoreFileSizeIsEmptyAndSHouldSuccessfullyExecuteRollback still fails occasionally with a failed assertion, but these look like test failures, not production code problems. I commented to this extend on the jira. TL;DR: I think we're good for the current RC. The test flaps sometimes; we just need to fix that. -- Lars ________________________________ From: Ramkrishna.S.Vasudevan <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]; 'lars hofhansl' <[EMAIL PROTECTED]> Sent: Sunday, October 7, 2012 9:01 PM Subject: RE: State of the 0.94 tests Hi Lars I was not in town and was in travel for the last 2 days. I will immediately check the reason for the testcase failures. Had I been there I would have helped out earlier. Sorry about that. Regards Ram > -----Original Message----- > From: lars hofhansl [mailto:[EMAIL PROTECTED]] > Sent: Monday, October 08, 2012 2:07 AM > To: [EMAIL PROTECTED] > Subject: Re: State of the 0.94 tests > > After this change things look better. Apologies for the noise. Stay > tuned for the next RC. > > -- Lars > > > > ________________________________ > From: lars hofhansl <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Sent: Sunday, October 7, 2012 11:23 AM > Subject: Re: State of the 0.94 tests > > I looked back through the failures. I had recently enabled all "ubuntu" > build vms for the 0.94 builds. > It turns out that most of the environment issues occur on ubuntu2. I > excluded that from the build vms. > > > -- Lars > > > > ________________________________ > From: Andrew Purtell <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Sent: Sunday, October 7, 2012 1:36 AM > Subject: Re: State of the 0.94 tests > > Too many open files usually is an environment issue. > > Lars, you should consider setting up a private Jenkins as a sanity > check. > > On Oct 7, 2012, at 2:41 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > > > Looks like after all that whining I finally got a successful build. > > But I lost confidence in the current 0.94 code line. > > > > Still, it is possible that all of these were environmental issue. If > we can get a few more successful runs, it could be OK. > > > > -- Lars > > > > > > > > ________________________________ > > From: lars hofhansl <[EMAIL PROTECTED]> > > To: hbase-dev <[EMAIL PROTECTED]> > > Sent: Saturday, October 6, 2012 11:11 PM > > Subject: State of the 0.94 tests > > > > I've been trying (essentially the entire day) getting a successful > jenkins build for 0.94 (triggering the test run periodically from my > phone). Not a *single* run succeeded. > > This is clearly not acceptable. Something is off. > > > > The tests that fails the most frequently are: > > - > TestSplitTransactionOnCluster.testShouldThrowIOExceptionIfStoreFileSize > IsEmptyAndSHouldSuccessfullyExecuteRollback > > - > TestSplitTransactionOnCluster.testShouldClearRITWhenNodeFoundInSplittin > gState > > (The failure cause most of the time is too many files open, but also > fail because of unavailable regions). > > > > Both tests were added recently (since 0.94.2RC2). See HBASE-6854 and > HBASE-6853. > > > > Either there is something wrong with the tests, or we introduced some > problems in the code base. > > > > Note that I am not dinging these two changes specifically. Both were > fixes with a lot of thought and care behind them. > > > > There are also various time out issues in other tests. > > > > These were all the fixes added since the last RC: +
lars hofhansl 2012-10-08, 04:15
-
Re: State of the 0.94 testsAndrew Purtell 2012-10-07, 08:46
The only tests consistently failing (against Hadoop 2) on our private
Jenkins are: TestReplication (timeouts waiting for truncate) TestMetaMigrationRemovingHTD TestLogRolling.testLogRollOnPipelineRestart It's pretty consistent. Looks like the last build also failed TestFromClientSide.testPoolBehavior. On Sunday, October 7, 2012, Andrew Purtell wrote: > Too many open files usually is an environment issue. > > Lars, you should consider setting up a private Jenkins as a sanity check. > > On Oct 7, 2012, at 2:41 PM, lars hofhansl <[EMAIL PROTECTED]<javascript:;>> > wrote: > > > Looks like after all that whining I finally got a successful build. > > But I lost confidence in the current 0.94 code line. > > > > Still, it is possible that all of these were environmental issue. If we > can get a few more successful runs, it could be OK. > > > > -- Lars > > > > > > > > ________________________________ > > From: lars hofhansl <[EMAIL PROTECTED] <javascript:;>> > > To: hbase-dev <[EMAIL PROTECTED] <javascript:;>> > > Sent: Saturday, October 6, 2012 11:11 PM > > Subject: State of the 0.94 tests > > > > I've been trying (essentially the entire day) getting a successful > jenkins build for 0.94 (triggering the test run periodically from my > phone). Not a *single* run succeeded. > > This is clearly not acceptable. Something is off. > > > > The tests that fails the most frequently are: > > - > TestSplitTransactionOnCluster.testShouldThrowIOExceptionIfStoreFileSizeIsEmptyAndSHouldSuccessfullyExecuteRollback > > - > TestSplitTransactionOnCluster.testShouldClearRITWhenNodeFoundInSplittingState > > (The failure cause most of the time is too many files open, but also > fail because of unavailable regions). > > > > Both tests were added recently (since 0.94.2RC2). See HBASE-6854 and > HBASE-6853. > > > > Either there is something wrong with the tests, or we introduced some > problems in the code base. > > > > Note that I am not dinging these two changes specifically. Both were > fixes with a lot of thought and care behind them. > > > > There are also various time out issues in other tests. > > > > These were all the fixes added since the last RC: > > [HBASE-4565] - Maven HBase build broken on cygwin with copynativelib.sh > call > > [HBASE-6299] - RS starting region open while failing ack to > HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and > a series of successive problems > > [HBASE-6679] - RegionServer aborts due to race between compaction and > split > > [HBASE-6688] - folder referred by thrift demo app instructions is > outdated > > [HBASE-6854] - Deletion of SPLITTING node on split rollback should clear > the region from RIT > > [HBASE-6871] - HFileBlockIndex Write Error in HFile V2 due to incorrect > split into intermediate index blocks > > [HBASE-6888] - HBase scripts ignore any HBASE_OPTS set in the environment > > [HBASE-6889] - Ignore source control files with apache-rat > > [HBASE-6900] - RegionScanner.reseek() creates NPE when a flush or > compaction happens before the reseek. > > [HBASE-6901] - Store file compactSelection throws > ArrayIndexOutOfBoundsException > > [HBASE-6906] - TestHBaseFsck#testQuarantine* tests are flakey due to > TableNotEnabledException > > [HBASE-6912] - Filters are not properly applied in certain cases > > [HBASE-6916] - HBA logs at info level errors that won't show in the shell > > [HBASE-6920] - On timeout connecting to master, client can get stuck and > never make progress > > [HBASE-6927] - WrongFS using HRegionInfo.getTableDesc() and different fs > for hbase.root and fs.defaultFS > > [HBASE-6946] - JavaDoc missing from release tarballs > > [HBASE-5582] - "No HServerInfo found for" should be a WARNING message > > [HBASE-6914] - Scans/Gets/Mutations don't give a good error if the table > is disabled. > > [HBASE-6853] - IllegalArgument Exception is thrown when an empty region > is spliitted. > > > > Unless somebody (Ram :) ) speaks up I will roll back HBASE-6854 and > HBASE-6853 (and maybe HBASE-6299) Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) +
Andrew Purtell 2012-10-07, 08:46
-
Re: State of the 0.94 testsAndrew Purtell 2012-10-07, 08:48
I should add we see those TestReplication failures on our old slow Jenkins
(3 year old dual core Xeons) but never locally. It's internal waits are too stringent for slow systems. On Sunday, October 7, 2012, Andrew Purtell wrote: > The only tests consistently failing (against Hadoop 2) on our private > Jenkins are: > > TestReplication (timeouts waiting for truncate) > TestMetaMigrationRemovingHTD > TestLogRolling.testLogRollOnPipelineRestart > > It's pretty consistent. > > Looks like the last build also failed TestFromClientSide.testPoolBehavior. > > On Sunday, October 7, 2012, Andrew Purtell wrote: > >> Too many open files usually is an environment issue. >> >> Lars, you should consider setting up a private Jenkins as a sanity check. >> >> On Oct 7, 2012, at 2:41 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: >> >> > Looks like after all that whining I finally got a successful build. >> > But I lost confidence in the current 0.94 code line. >> > >> > Still, it is possible that all of these were environmental issue. If we >> can get a few more successful runs, it could be OK. >> > >> > -- Lars >> > >> > >> > >> > ________________________________ >> > From: lars hofhansl <[EMAIL PROTECTED]> >> > To: hbase-dev <[EMAIL PROTECTED]> >> > Sent: Saturday, October 6, 2012 11:11 PM >> > Subject: State of the 0.94 tests >> > >> > I've been trying (essentially the entire day) getting a successful >> jenkins build for 0.94 (triggering the test run periodically from my >> phone). Not a *single* run succeeded. >> > This is clearly not acceptable. Something is off. >> > >> > The tests that fails the most frequently are: >> > - >> TestSplitTransactionOnCluster.testShouldThrowIOExceptionIfStoreFileSizeIsEmptyAndSHouldSuccessfullyExecuteRollback >> > - >> TestSplitTransactionOnCluster.testShouldClearRITWhenNodeFoundInSplittingState >> > (The failure cause most of the time is too many files open, but also >> fail because of unavailable regions). >> > >> > Both tests were added recently (since 0.94.2RC2). See HBASE-6854 and >> HBASE-6853. >> > >> > Either there is something wrong with the tests, or we introduced some >> problems in the code base. >> > >> > Note that I am not dinging these two changes specifically. Both were >> fixes with a lot of thought and care behind them. >> > >> > There are also various time out issues in other tests. >> > >> > These were all the fixes added since the last RC: >> > [HBASE-4565] - Maven HBase build broken on cygwin with copynativelib.sh >> call >> > [HBASE-6299] - RS starting region open while failing ack to >> HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and >> a series of successive problems >> > [HBASE-6679] - RegionServer aborts due to race between compaction and >> split >> > [HBASE-6688] - folder referred by thrift demo app instructions is >> outdated >> > [HBASE-6854] - Deletion of SPLITTING node on split rollback should >> clear the region from RIT >> > [HBASE-6871] - HFileBlockIndex Write Error in HFile V2 due to incorrect >> split into intermediate index blocks >> > [HBASE-6888] - HBase scripts ignore any HBASE_OPTS set in the >> environment >> > [HBASE-6889] - Ignore source control files with apache-rat >> > [HBASE-6900] - RegionScanner.reseek() creates NPE when a flush or >> compaction happens before the reseek. >> > [HBASE-6901] - Store file compactSelection throws >> ArrayIndexOutOfBoundsException >> > [HBASE-6906] - TestHBaseFsck#testQuarantine* tests are flakey due to >> TableNotEnabledException >> > [HBASE-6912] - Filters are not properly applied in certain cases >> > [HBASE-6916] - HBA logs at info level errors that won't show in the >> shell >> > [HBASE-6920] - On timeout connecting to master, client can get stuck >> and never make progress >> > [HBASE-6927] - WrongFS using HRegionInfo.getTableDesc() and different >> fs for hbase.root and fs.defaultFS >> > [HBASE-6946] - JavaDoc missing from release tarballs >> > [HBASE-5582] - "No HServerInfo found for" should be a WARNING message Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) +
Andrew Purtell 2012-10-07, 08:48
|