Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # dev >> Update on my 1270 testing


+
Patrick Hunt 2011-11-05, 17:14
+
Mahadev Konar 2011-11-05, 18:59
+
Flavio Junqueira 2011-11-05, 19:01
+
Camille Fournier 2011-11-05, 19:15
+
Flavio Junqueira 2011-11-05, 19:22
+
Patrick Hunt 2011-11-07, 23:23
+
Camille Fournier 2011-11-08, 00:20
Copy link to this message
-
Re: Update on my 1270 testing
I'm currently trying to wrap up ZOOKEEPER-1292, and I can move to  
early abandonment once I'm done here.

-Flavio

On Nov 8, 2011, at 1:20 AM, Camille Fournier wrote:

> Sorry you're feeling bad, Patrick! We can take it from here.
>
> I would really like to get some clarification on this test from some
> of the LE experts. What does it really mean that this test is failing?
> Is this sort of failure that means that sometimes we have server
> startup that takes a bit longer because leader gives up the election,
> or will server startup completely hang due to this? If it's the
> latter, it should be a high priority fix for 3.4, but if it means that
> occasionally startup might have to fail and retry once, it might be
> worth worry about in 3.4.1.
>
> Thoughts?
>
> C
>
> On Mon, Nov 7, 2011 at 6:23 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote:
>> That's fine (direction re 1-4). However my CI branch 3.4 build failed
>> over the w/e (once out of four runs). This is AFTER "Preparing for
>> release 3.4.0 - take 2" was applied (so testing includes 1270, 1264,
>> etc...)
>>
>> Notice testEarlyLeaderAbandonment is failing. I have attached the log
>> file to ZOOKEEPER-1270 JIRA:
>> https://issues.apache.org/jira/secure/attachment/12502838/testEarlyLeaderAbandonment5.txt.gz
>>
>> java.lang.RuntimeException: Waiting too long
>>        at  
>> org
>> .apache
>> .zookeeper
>> .server
>> .quorum.QuorumPeerMainTest.waitForAll(QuorumPeerMainTest.java:324)
>>        at  
>> org
>> .apache
>> .zookeeper
>> .server
>> .quorum
>> .QuorumPeerMainTest
>> .testEarlyLeaderAbandonment(QuorumPeerMainTest.java:195)
>>        at org.apache.zookeeper.JUnit4ZKTestRunner
>> $LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
>>
>> Should I reopen 1270, or a new jira, or... ? LMK.
>>
>> Note - I'm feeling quite ill so I have limited time to provide f/b &
>> test for the next day or so.
>>
>> Patrick
>>
>> On Sat, Nov 5, 2011 at 12:22 PM, Flavio Junqueira <fpj@yahoo-
>> inc.com> wrote:
>>> I'm fine with your proposal. -Flavio
>>>
>>> On Nov 5, 2011, at 8:15 PM, Camille Fournier wrote:
>>>
>>>> 2 has been flaky for so long, not sure whether it's worth being a  
>>>> blocker.
>>>> The AsyncHammerTests never pass for me locally. Not sure if it's a
>>>> problem or not... I am tempted to go with Mahadev on this and get  
>>>> this
>>>> 3.4 release out the door. I would be happy to help manage a 3.4.1
>>>> release soon thereafter if we find serious issues.
>>>>
>>>> C
>>>>
>>>> On Sat, Nov 5, 2011 at 3:01 PM, Flavio Junqueira <fpj@yahoo-
>>>> inc.com>
>>>> wrote:
>>>>>
>>>>> If 2) is flakey,  we need to fix it, no?
>>>>>
>>>>> -Flavio
>>>>>
>>>>> On Nov 5, 2011, at 6:14 PM, Patrick Hunt wrote:
>>>>>
>>>>>> I ran the 1270-1194 patch continually overnight (trunk) in my  
>>>>>> ci env,
>>>>>> after ~25 test runs I saw 4 failures:
>>>>>>
>>>>>> 1) #402 - QuorumTest.testFollowersStartAfterLeader
>>>>>> 2) #407 - org.apache.zookeeper.test.FLETest.testLE
>>>>>> 3) #410 - org.apache.zookeeper.test.AsyncHammerTest.testHammer
>>>>>> 4) #415 - org.apache.zookeeper.test.AsyncHammerTest.testHammer
>>>>>>
>>>>>> 1) client could not connect to reestablished quorum: giving up  
>>>>>> after
>>>>>> 30+ seconds.
>>>>>> 2) known flakey test
>>>>>> 3) QP failed to shutdown in 30 seconds:
>>>>>> QuorumPeer[myid=3]0.0.0.0/0.0.0.0:11224
>>>>>> 4) QP failed to shutdown in 30 seconds:
>>>>>> QuorumPeer[myid=1]0.0.0.0/0.0.0.0:11222
>>>>>>
>>>>>> On the plus side no "testearlyleaderabandon" failures.
>>>>>>
>>>>>> On the minus side 3/4 are a bit worrysome. Searching back  
>>>>>> through all
>>>>>> my previous failures I don't see this happening. Perhaps these  
>>>>>> changes
>>>>>> have shifted some timing? My main concern is that this might be  
>>>>>> caused
>>>>>> directly by the patch itself....
>>>>>>
>>>>>> Patrick
>>>>>
>>>>> flavio
>>>>> junqueira
>>>>>
>>>>> research scientist
>>>>>
>>>>> [EMAIL PROTECTED]
>>>>> direct +34 93-183-8828
>>>>>
>>>>> avinguda diagonal 177, 8th floor, barcelona, 08018, es

flavio
junqueira

research scientist

[EMAIL PROTECTED]
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301
+
Camille Fournier 2011-11-08, 19:01
+
Camille Fournier 2011-11-08, 19:25
+
Patrick Hunt 2011-11-08, 22:01