Have you run hbck to diagnose the problem ?
Since the issue might have involved hdfs, browsing DataNode log(s) may
provide some clue as well.
What hadoop version are you using ?
On Thu, Apr 11, 2013 at 10:58 PM, ramkrishna vasudevan <
[EMAIL PROTECTED]> wrote:
> When you say that the parent regions got reopened does that mean that you
> did not lose any data(any data could not be read). The reason am asking is
> if after the parent got split into daughters and the data was written to
> daughters and if the daughters related files could not be opened you could
> have ended up in not able to read the data.
> Some logs could tell us what made the parent to get reopened rather than
> daughters. Another thing i would like to ask is was the cluster brought
> down abruptly by killing the RS.
> Which version of HBase?
> On Fri, Apr 12, 2013 at 11:20 AM, Brennon Church <[EMAIL PROTECTED]>
> > Hello,
> > I had an interesting problem come up recently. We have a few thousand
> > regions across 8 datanode/regionservers. I made a change, increasing the
> > heap size for hadoop from 128M to 2048M which ended up bringing the
> > to a complete halt after about 1 hour. I reverted back to 128M and
> > things back on again but didn't realize at the time that I came up with 9
> > fewer regions than I started. Upon further investigation, I found that
> > 9 missing regions were from splits that occurred while the cluster was
> > running after making the heap change and before it came to a halt. There
> > was a 10th regions (5 splits involved in total) that managed to get
> > recovered. The really odd thing is that in the case of the other 9
> > regions, the original parent regions, which as far as I can tell in the
> > logs were deleted, were re-opened upon restarting things once again. The
> > daughter regions were gone. Interestingly, I found the orphaned
> > still intact, and in at least some cases have been able to extract the
> > from them and will hopefully re-add it to the tables.
> > My question is this. Does anyone know based on the rather muddled
> > description I've given above, what could have possibly happened here? My
> > best guess is that the bad state that hdfs was in caused some critical
> > component of the split process to be missed, which resulted a reference
> > the parent regions sticking around and losing the references to the
> > daughter regions.
> > Thanks for any insight you can provide.
> > --Brennon