Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - Getting unit tests to pass


Copy link to this message
-
Re: Getting unit tests to pass
Lars Francke 2013-07-23, 06:54
Slightly related, sorry for hijacking: I can't get HBase trunk to
build. In particular TestHCM.testClusterStatus always fails for me. I
tried on my own Jenkins as well as my IDE (IntelliJ) with the same
result (two different machines, CentOS & Mac OS).

mvn -U -PrunAllTests -Dmaven.test.redirectTestOutputToFile=true
-Dit.test=noItTest clean install
<http://pastebin.com/upFjq09A>

>From my MacBook's command line I got the test to pass using the same
command but not in Jenkins or from IntelliJ.

I'm happy to post in a new thread if this is distracting and no one
else has seen this before.

Any ideas?

Thanks,
Lars

On Tue, Jul 23, 2013 at 7:01 AM, Stack <[EMAIL PROTECTED]> wrote:
> nvm.  I read the resourcechecker code.  It is just printing out before and
> afters so my speculation that we are up against fd limits is just off.
>
> Back to figuring out why tests fail at random....
>
> St.Ack
>
>
> On Mon, Jul 22, 2013 at 9:50 PM, Stack <[EMAIL PROTECTED]> wrote:
>
>> Here is another from tail of
>> https://issues.apache.org/jira/browse/HBASE-5995
>>
>> 2013-07-23 01:23:29,574 INFO  [pool-1-thread-1]
>> hbase.ResourceChecker(171): after:
>> regionserver.wal.TestLogRolling#testLogRollOnPipelineRestart Thread=39 (was
>> 31) - Thread LEAK? -, OpenFileDescriptor=312 (was 272) - OpenFileDescriptor
>> LEAK? -, MaxFileDescriptor=40000 (was 40000), SystemLoadAverage=351 (was
>> 368), ProcessCount=144 (was 142) - ProcessCount LEAK? -,
>> AvailableMemoryMB=906 (was 1995), ConnectionCount=0 (was 0)
>>
>> This one showed up as a zombie too; stuck.
>>
>> Or here, https://builds.apache.org/view/H-L/view/HBase/job/HBase-TRUNK/,
>> where we'd had a nice run of passing tests, of a sudden a test that I've
>> not seen fail before, fails:
>>
>> https://builds.apache.org/view/H-L/view/HBase/job/HBase-TRUNK/4282/
>>
>>
>> org.apache.hadoop.hbase.master.TestActiveMasterManager.testActiveMasterManagerFromZK
>>
>> Near the end of the test, the resource checker reports:
>> *
>> *
>>
>>  - Thread LEAK? -, OpenFileDescriptor=100 (was 92) - OpenFileDescriptor LEAK? -, MaxFileDescriptor=40000 (was 40000), SystemLoadAverage=328 (was 331), ProcessCount=138 (was 138), AvailableMemoryMB=1223 (was 1246), ConnectionCount=0 (was 0)
>>
>>
>>
>> Getting tests to pass on these build boxes (other than hadoopqa which is a
>> different set of machines) seems unattainable.
>>
>> I will write infra about the 40k to see if they can do something about
>> that.
>>
>> St.Ack
>>
>>
>>
>>
>> On Mon, Jul 22, 2013 at 9:13 PM, Stack <[EMAIL PROTECTED]> wrote:
>>
>>> By way of illustration of how loaded Apache build boxes can be:
>>>
>>> Thread LEAK? -, OpenFileDescriptor=174 (was 162) - OpenFileDescriptor LEAK? -, MaxFileDescriptor=40000 (was 40000), SystemLoadAverage=351 (was 383), ProcessCount=142 (was 144), AvailableMemoryMB=819 (was 892), ConnectionCount=0 (was 0)
>>>
>>> This seems to have caused a test that usually passes to fail:
>>> https://issues.apache.org/jira/browse/HBASE-9023
>>>
>>> St.Ack
>>>
>>>
>>> On Mon, Jul 22, 2013 at 11:49 AM, Stack <[EMAIL PROTECTED]> wrote:
>>>
>>>> Below is a state of hbase 0.95/trunk unit tests (Includes a little
>>>> taxonomy of test failure type definitions).
>>>>
>>>> On Andrew's ec2 build box, 0.95 is passing most of the time:
>>>>
>>>> http://54.241.6.143/job/HBase-0.95/
>>>> http://54.241.6.143/job/HBase-0.95-Hadoop-2/
>>>>
>>>> It is not as good on Apache build box but it is getting better:
>>>>
>>>> https://builds.apache.org/view/H-L/view/HBase/job/hbase-0.95/
>>>> https://builds.apache.org/view/H-L/view/HBase/job/hbase-0.95-on-hadoop2/
>>>>
>>>> On Apache, I have seen loads up in the 500s and all file descriptors
>>>> used according to the little resources report printed at the end of each
>>>> test.  If these numbers are to be believed (TBD), we may never achieve 100%
>>>> pass rate on Apache builds.
>>>>
>>>> Andrew's ec2 builds run the integration tests too where the apache
>>>> builds do not -- sometimes we'll fail an integration test run which makes