Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> recent 0.94 failures


+
lars hofhansl 2013-01-24, 04:03
+
Ted Yu 2013-01-24, 04:07
+
Ted Yu 2013-01-24, 04:34
+
lars hofhansl 2013-01-24, 05:26
+
lars hofhansl 2013-01-25, 07:16
+
Ted Yu 2013-01-25, 15:11
+
lars hofhansl 2013-01-25, 22:02
+
Sergey Shelukhin 2013-01-25, 22:37
Copy link to this message
-
Re: recent 0.94 failures
I would also note that these failures are qualitative different from what I have seen previously:
- The tests failing are seemingly random
- I have run some of these failing tests in a loop for hours, but have not seen any failures locally
Most tests I looked at failed because of some reliance on wall clock time (test times out, or waits in a loop for something to happen).
It almost seems like the build VMs suddenly introduce almost arbitrary wait times.
-- Lars

________________________________
 From: Sergey Shelukhin <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
Cc: Ted Yu <[EMAIL PROTECTED]>
Sent: Friday, January 25, 2013 2:37 PM
Subject: Re: recent 0.94 failures
 
I see some timeout failures of trunk too. May these be produced by the
same cause?

On Fri, Jan 25, 2013 at 2:02 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> More failures. Once TestSplitTransactionOnCluster didn't finish. In the last run TestHBaseFsck did not finish.
>
> -- Lars
>
>
>
> ________________________________
>  From: Ted Yu <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
> Sent: Friday, January 25, 2013 7:11 AM
> Subject: Re: recent 0.94 failures
>
>
> Looking at https://builds.apache.org/job/HBase-0.94/771/console :
>
> [INFO] BUILD SUCCESS
> [INFO] ------------------------------------------------------------------------
> [INFO] Total time: 44:01.553s
>
> FYI
>
> On Thu, Jan 24, 2013 at 11:16 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
> Got a lot of failed tests that I have not seen failing at before.
>>It looks like the test VMs collectively got slower. Testtimes are up from ~45mins to ~70mins
>>
>>Lots the recent failures are because of tests timing out.
>>
>>
>>-- Lars
>>
>>
>>
>>________________________________
>> From: lars hofhansl <[EMAIL PROTECTED]>
>>To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>>Sent: Wednesday, January 23, 2013 9:26 PM
>>
>>Subject: Re: recent 0.94 failures
>>
>>Hmm... Also got a successful run now.
>>Maybe it was a temporary env issue. It is just strange that the same test would fail twice in a row suddenly, along with other test that have not failed in a while.
>>
>>Looking at the runtime of TestMiniClusterLoadParallel on Ubuntu1 it tooK 104s. In the latest run on Ubuntu5 it took 292s.
>>In the failed runs it over 500s.
>>
>>-- Lars
>>________________________________
>>From: Ted Yu <[EMAIL PROTECTED]>
>>To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
>>Sent: Wednesday, January 23, 2013 8:34 PM
>>Subject: Re: recent 0.94 failures
>>
>>I ran the tests 4 rounds and they all passed:
>>1046  ~/runtest.sh 4
>>TestLruBlockCache,TestMiniClusterLoadParallel,TestLruBlockCache,TestCompactionState,TestRSKilledWhenMasterInitializing
>>
>>FYI
>>
>>On Wed, Jan 23, 2013 at 8:07 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>>
>>> Lars:
>>> Here is what I put in HBASE-7638:
>>>
>>> Sergey and I looked at the patch.
>>> There is no potential for NullPointerException similar to what HBASE-7268
>>> addendum fixes.
>>> See deleteCachedLocation():
>>> {code}
>>>           if (oldLocation != null) {
>>>             isStaleDelete = (source != null) &&
>>> !oldLocation.equals(source);
>>> {code}
>>> I also ran the tests that failed in recent 0.94 builds and they all passed:
>>>
>>>  1041  mt -Dtest=TestLruBlockCache,TestMiniClusterLoadParallel
>>>  1042  mt -Dtest=TestLruBlockCache
>>>  1043  mt -Dtest=TestCompactionState
>>>  1044  mt -Dtest=TestRSKilledWhenMasterInitializing
>>>
>>> I would also loop the above tests to see if I can get test failure.
>>>
>>> I understand it is important to have a green 0.94 build. So whether / what
>>> to roll back is up to you.
>>>
>>> Cheers
>>>
>>>
>>> On Wed, Jan 23, 2013 at 8:03 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>>>
>>>> https://builds.apache.org/job/HBase-0.94/
>>>>
>>>>
>>>> Prime suspects are: HBASE-7599 (Devaraj), and HBASE-7638 (Sergey).
>>>> If anybody has any ideas.
>>>>
>>>> Otherwise I'll start with reverting these changes.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB