Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Getting unit tests to pass


Copy link to this message
-
Re: Getting unit tests to pass
Here is another from tail of
https://issues.apache.org/jira/browse/HBASE-5995

2013-07-23 01:23:29,574 INFO  [pool-1-thread-1] hbase.ResourceChecker(171):
after: regionserver.wal.TestLogRolling#testLogRollOnPipelineRestart
Thread=39 (was 31) - Thread LEAK? -, OpenFileDescriptor=312 (was 272) -
OpenFileDescriptor LEAK? -, MaxFileDescriptor=40000 (was 40000),
SystemLoadAverage=351 (was 368), ProcessCount=144 (was 142) - ProcessCount
LEAK? -, AvailableMemoryMB=906 (was 1995), ConnectionCount=0 (was 0)

This one showed up as a zombie too; stuck.

Or here, https://builds.apache.org/view/H-L/view/HBase/job/HBase-TRUNK/,
where we'd had a nice run of passing tests, of a sudden a test that I've
not seen fail before, fails:

https://builds.apache.org/view/H-L/view/HBase/job/HBase-TRUNK/4282/

org.apache.hadoop.hbase.master.TestActiveMasterManager.testActiveMasterManagerFromZK

Near the end of the test, the resource checker reports:
*
*

 - Thread LEAK? -, OpenFileDescriptor=100 (was 92) -
OpenFileDescriptor LEAK? -, MaxFileDescriptor=40000 (was 40000),
SystemLoadAverage=328 (was 331), ProcessCount=138 (was 138),
AvailableMemoryMB=1223 (was 1246), ConnectionCount=0 (was 0)

Getting tests to pass on these build boxes (other than hadoopqa which is a
different set of machines) seems unattainable.

I will write infra about the 40k to see if they can do something about that.

St.Ack
On Mon, Jul 22, 2013 at 9:13 PM, Stack <[EMAIL PROTECTED]> wrote:

> By way of illustration of how loaded Apache build boxes can be:
>
> Thread LEAK? -, OpenFileDescriptor=174 (was 162) - OpenFileDescriptor LEAK? -, MaxFileDescriptor=40000 (was 40000), SystemLoadAverage=351 (was 383), ProcessCount=142 (was 144), AvailableMemoryMB=819 (was 892), ConnectionCount=0 (was 0)
>
> This seems to have caused a test that usually passes to fail:
> https://issues.apache.org/jira/browse/HBASE-9023
>
> St.Ack
>
>
> On Mon, Jul 22, 2013 at 11:49 AM, Stack <[EMAIL PROTECTED]> wrote:
>
>> Below is a state of hbase 0.95/trunk unit tests (Includes a little
>> taxonomy of test failure type definitions).
>>
>> On Andrew's ec2 build box, 0.95 is passing most of the time:
>>
>> http://54.241.6.143/job/HBase-0.95/
>> http://54.241.6.143/job/HBase-0.95-Hadoop-2/
>>
>> It is not as good on Apache build box but it is getting better:
>>
>> https://builds.apache.org/view/H-L/view/HBase/job/hbase-0.95/
>> https://builds.apache.org/view/H-L/view/HBase/job/hbase-0.95-on-hadoop2/
>>
>> On Apache, I have seen loads up in the 500s and all file descriptors used
>> according to the little resources report printed at the end of each test.
>>  If these numbers are to be believed (TBD), we may never achieve 100% pass
>> rate on Apache builds.
>>
>> Andrew's ec2 builds run the integration tests too where the apache builds
>> do not -- sometimes we'll fail an integration test run which makes the
>> Andrew ec2 red/green ratio look worse that it actually is.
>>
>> Trunk builds lag.  They are being worked on.
>>
>> We seem to be over the worst of the flakey unit tests.  We have a few
>> stragglers still but they are being hunted down by the likes of the
>> merciless Jimmy Xiang and Jeffrey Zhong.
>>
>> The "zombies" have been mostly nailed too (where "zombies" are tests that
>> refuse to die continuing after the suite has completed causing the build to
>> fail).  The zombie trap from test-patch.sh was ported over to apache and
>> ec2 build and it caught the last of undying.
>>
>> We are now into a new phase where "all" tests pass but the build still
>> fails.  Here is an example:
>> http://54.241.6.143/job/HBase-TRUNK/429/org.apache.hbase$hbase-server/ The only clue I have to go on is the fact that when we fail, the number of
>> tests run is less than the total that shows for a successful run.
>>
>> Unless anyone has a better idea, to figure why the hang, I compare the
>> list of tests that show in a good run vs. those of a bad run.  Tests that
>> are in the good run but missing from the bad run are deemed suspect.  In
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB