Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # user - 15 minutes to sync?


Copy link to this message
-
Re: 15 minutes to sync?
Jordan Zimmerman 2012-08-01, 00:46
No, I don't have that data. I'll try to get it next time.

On Jul 31, 2012, at 5:13 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote:

> Any monitoring of mem, gc, disk, etc... that might give some
> additional insight? Perhaps the disks were loaded and that was slowing
> things? Or swapping/gc of the jvm? You might be able to tune to
> resolve some of that.
>
> One thing you can try is copying the snapshot file to a an empty
> datadir on a separate machine and try starting a 2 node cluster.
> (where the second node starts with an empty datadir)
>
> Patrick
>
> On Tue, Jul 31, 2012 at 3:34 PM, Jordan Zimmerman
> <[EMAIL PROTECTED]> wrote:
>>> Seems you are down to 4gb now. That still seems way too high for
>>> "coordination" operations… ?
>>
>> A big problem currently is detritus nodes. People use lock recipes for various movie IDs and they leave garbage parent nodes around in the thousands. I've written some gc tasks to clean them up but it's been a slow process to get everyone to use it. I know there is a Jira to help with this but I don't know the status.
>>
>> -JZ
>>
>> On Jul 31, 2012, at 3:17 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote:
>>
>>> On Tue, Jul 31, 2012 at 3:14 PM, Jordan Zimmerman
>>> <[EMAIL PROTECTED]> wrote:
>>>> There were a lot creations but I removed those nodes last night. How long does it take to clear out of the snapshot?
>>>
>>> The snapshot is a copy of whatever is in the znode tree at the time
>>> the snapshot is taken. (so instantaneous the next time a snapshot is
>>> taken). You can see the dates and the epoch number if that gives you
>>> any insight (epoch is the upper 32 bits of the filename)
>>>
>>> Seems you are down to 4gb now. That still seems way too high for
>>> "coordination" operations... ?
>>>
>>> Patrick
>>>
>>>>
>>>> On Jul 31, 2012, at 2:52 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> You have an 11gig snapshot file. That's very large. Did someone
>>>>> unexpectedly overload the server with znode creations?
>>>>>
>>>>> When a follower comes up the leader needs to serialize the znodes to
>>>>> the snapshot file, stream it to the follower, who saves it locally
>>>>> then deserializes it. (11g/15min is avg about 12meg/second for this
>>>>> process)
>>>>>
>>>>> Often times this is exacerbated by the max heap and GC interactions.
>>>>>
>>>>> Patrick
>>>>>
>>>>> On Tue, Jul 31, 2012 at 2:23 PM, Jordan Zimmerman
>>>>> <[EMAIL PROTECTED]> wrote:
>>>>>> BTW - this is 3.3.5
>>>>>>
>>>>>> On Jul 31, 2012, at 2:22 PM, Jordan Zimmerman <[EMAIL PROTECTED]> wrote:
>>>>>>
>>>>>>> We've had a few outages of our ZK cluster recently. When trying to bring the cluster back up it's been taking 10-15 minutes for the followers to sync with the Leader. Any idea what might cause this? Here's an ls of the data dir:
>>>>>>>
>>>>>>> -rw-r--r-- 1 zookeeperserverprod nac    67108880 Jul 31 20:39 log.3900a4bc75
>>>>>>> -rw-r--r-- 1 zookeeperserverprod nac    67108880 Jul 31 20:40 log.3900a634ee
>>>>>>> -rw-r--r-- 1 zookeeperserverprod nac    67108880 Jul 31 21:21 log.3a00000001
>>>>>>> -rw-r--r-- 1 zookeeperserverprod nac    67108880 Jul 31 21:22 log.3a000139a2
>>>>>>> -rw-r--r-- 1 zookeeperserverprod nac  9279729723 Jul 31 20:42 snapshot.3900a634ec
>>>>>>> -rw-r--r-- 1 zookeeperserverprod nac 11126306780 Jul 31 21:09 snapshot.3900a6b149
>>>>>>> -rw-r--r-- 1 zookeeperserverprod nac  4153727423 Jul 31 21:22 snapshot.3a000139a0
>>>>>>>
>>>>>>
>>>>
>>