I was restarting the DFS cluster. First, the DNs did not join. But if
I kept stopping and starting each DN, eventually, all DNs joined the
NN. But the NN doesn't look healthy.
The machine has 16 cores. The NN process's CPU stayed at 20% and the
"system CPU" constantly took up 50%. Here is the top output:
top - 18:01:34 up 144 days, 16:05, 5 users, load average: 12.65, 12.06, 12.34
Tasks: 363 total, 6 running, 357 sleeping, 0 stopped, 0 zombie
Cpu(s): 18.2%us, 48.7%sy, 0.0%ni, 28.9%id, 0.0%wa, 0.0%hi, 4.1%si, 0.0%st
Mem: 33000560k total, 6449412k used, 26551148k free, 596812k buffers
Swap: 64452600k total, 0k used, 64452600k free, 3318352k cached
And it has been in this state for a long long time - several hours.
Anybody has seen this before?