Can you post the stack trace and relevant log snippets ?
On Nov 16, 2011, at 2:40 AM, Matthias Hofschen <[EMAIL PROTECTED]> wrote:
> we had a case today where the loadbalancer stopped working.
> (cloudera-cdh3-u1, 52nodes). Basically we had a hot region that we moved to
> another node. Shortly thereafter the regionserver of that region was
> stopped. In the master logs we see that master is trying to contact this
> regionserver to move the region. At the same time we had about 1000 regions
> missing because of the stopped regionserver. These where not assigned to
> another regionserver. I suspect that the the load balancer on the master
> machine was blocked by trying to move the one region which was not
> possible. We then restarted the master which solved the problem.
> Is this a known problem, should I log an issue? I do have a stacktrace from
> the master machine.
> Cheers Matthias