We had a brute force cluster shutdown event that was followed by log
recovery when the cluster went back online.
The cluster took hours to split the logs and recover the regions, all of
which might have made sense since we have quite a lot of regions (around
13K) but the weird thing is that there was no obvious bottleneck during the
recovery process. CPU was almost idle on all the nodes, IO was on 5-20%
utilization, memory was OK, network wasn't overloaded, but still it was
Any idea what can be slowing it down?
Ted Yu 2013-06-18, 12:15
Eran Kutner 2013-06-18, 16:43