What Hadoop version are you using ?
Can you check NameNode log to see if lease recovery took long time ?
On Jun 18, 2013, at 5:11 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
> We had a brute force cluster shutdown event that was followed by log
> recovery when the cluster went back online.
> The cluster took hours to split the logs and recover the regions, all of
> which might have made sense since we have quite a lot of regions (around
> 13K) but the weird thing is that there was no obvious bottleneck during the
> recovery process. CPU was almost idle on all the nodes, IO was on 5-20%
> utilization, memory was OK, network wasn't overloaded, but still it was
> Any idea what can be slowing it down?