Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> State of the 0.94 tests


Copy link to this message
-
Re: State of the 0.94 tests
I looked back through the failures. I had recently enabled all "ubuntu" build vms for the 0.94 builds.
It turns out that most of the environment issues occur on ubuntu2. I excluded that from the build vms.
-- Lars

________________________________
 From: Andrew Purtell <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Sunday, October 7, 2012 1:36 AM
Subject: Re: State of the 0.94 tests
 
Too many open files usually is an environment issue.

Lars, you should consider setting up a private Jenkins as a sanity check.

On Oct 7, 2012, at 2:41 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Looks like after all that whining I finally got a successful build.
> But I lost confidence in the current 0.94 code line.
>
> Still, it is possible that all of these were environmental issue. If we can get a few more successful runs, it could be OK.
>
> -- Lars
>
>
>
> ________________________________
> From: lars hofhansl <[EMAIL PROTECTED]>
> To: hbase-dev <[EMAIL PROTECTED]>
> Sent: Saturday, October 6, 2012 11:11 PM
> Subject: State of the 0.94 tests
>
> I've been trying (essentially the entire day) getting a successful jenkins build for 0.94 (triggering the test run periodically from my phone). Not a *single* run succeeded.
> This is clearly not acceptable. Something is off.
>
> The tests that fails the most frequently are:
> - TestSplitTransactionOnCluster.testShouldThrowIOExceptionIfStoreFileSizeIsEmptyAndSHouldSuccessfullyExecuteRollback
> - TestSplitTransactionOnCluster.testShouldClearRITWhenNodeFoundInSplittingState
> (The failure cause most of the time is too many files open, but also fail because of unavailable regions).
>
> Both tests were added recently (since 0.94.2RC2). See HBASE-6854 and HBASE-6853.
>
> Either there is something wrong with the tests, or we introduced some problems in the code base.
>
> Note that I am not dinging these two changes specifically. Both were fixes with a lot of thought and care behind them.
>
> There are also various time out issues in other tests.
>
> These were all the fixes added since the last RC:
> [HBASE-4565] - Maven HBase build broken on cygwin with copynativelib.sh call
> [HBASE-6299] - RS starting region open while failing ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems
> [HBASE-6679] - RegionServer aborts due to race between compaction and split
> [HBASE-6688] - folder referred by thrift demo app instructions is outdated
> [HBASE-6854] - Deletion of SPLITTING node on split rollback should clear the region from RIT
> [HBASE-6871] - HFileBlockIndex Write Error in HFile V2 due to incorrect split into intermediate index blocks
> [HBASE-6888] - HBase scripts ignore any HBASE_OPTS set in the environment
> [HBASE-6889] - Ignore source control files with apache-rat
> [HBASE-6900] - RegionScanner.reseek() creates NPE when a flush or compaction happens before the reseek.
> [HBASE-6901] - Store file compactSelection throws ArrayIndexOutOfBoundsException
> [HBASE-6906] - TestHBaseFsck#testQuarantine* tests are flakey due to TableNotEnabledException
> [HBASE-6912] - Filters are not properly applied in certain cases
> [HBASE-6916] - HBA logs at info level errors that won't show in the shell
> [HBASE-6920] - On timeout connecting to master, client can get stuck and never make progress
> [HBASE-6927] - WrongFS using HRegionInfo.getTableDesc() and different fs for hbase.root and fs.defaultFS
> [HBASE-6946] - JavaDoc missing from release tarballs
> [HBASE-5582] - "No HServerInfo found for" should be a WARNING message
> [HBASE-6914] - Scans/Gets/Mutations don't give a good error if the table is disabled.
> [HBASE-6853] - IllegalArgument Exception is thrown when an empty region is spliitted.
>
> Unless somebody (Ram :) ) speaks up I will roll back HBASE-6854 and HBASE-6853 (and maybe HBASE-6299)
>
> I could also roll all of these back except HBASE-6920 (which is the one that sunk the last RC). And leave the rest of the next RC.