-Re: Lost regions question
Ted Yu 2013-04-12, 16:34
Can you try hbck to see if the problem is repaired ?
On Fri, Apr 12, 2013 at 9:27 AM, ramkrishna vasudevan <
[EMAIL PROTECTED]> wrote:
> Oh..sorry to hear that . But i think it should be there in the system but
> not allowing you to access. We should be able to bring it back.
> One set of logs that would be of interest is that of the RS and master when
> the split happened.
> And the main thing would be that when you restarted your cluster and the
> Master again came back. That is where the system does some self
> rectification after it sees if there were some partial splits.
> On Fri, Apr 12, 2013 at 9:34 PM, Brennon Church <[EMAIL PROTECTED]>
> > Hello,
> > We lost the data when the parent regions got reopened. My guess, and
> > only that, is that the regions were essentially empty when they started
> > again in these cases. We definitely lost data from the tables.
> > I've looked through the hdfs and hbase logs and can't find any obvious
> > difference between a successful split and these failed ones. All steps
> > show up the same in all cases. After the handled split message that
> > the parent and daughter regions, the next reference is to the parent
> > regions once again as hbase is started back up after the failure. No
> > further reference to the daughters is made.
> > I couldn't cleanly shut several of the regionservers down, so they were
> > abruptly killed, yes.
> > HBase version is 0.92.0, and hadoop is 1.0.1.
> > Thanks.
> > --Brennon
> > On 4/11/13 10:58 PM, ramkrishna vasudevan wrote:
> >> When you say that the parent regions got reopened does that mean that
> >> did not lose any data(any data could not be read). The reason am asking
> >> is
> >> if after the parent got split into daughters and the data was written to
> >> daughters and if the daughters related files could not be opened you
> >> have ended up in not able to read the data.
> >> Some logs could tell us what made the parent to get reopened rather than
> >> daughters. Another thing i would like to ask is was the cluster brought
> >> down abruptly by killing the RS.
> >> Which version of HBase?
> >> Regards
> >> Ram
> >> On Fri, Apr 12, 2013 at 11:20 AM, Brennon Church <[EMAIL PROTECTED]>
> >> wrote:
> >> Hello,
> >>> I had an interesting problem come up recently. We have a few thousand
> >>> regions across 8 datanode/regionservers. I made a change, increasing
> >>> heap size for hadoop from 128M to 2048M which ended up bringing the
> >>> cluster
> >>> to a complete halt after about 1 hour. I reverted back to 128M and
> >>> turned
> >>> things back on again but didn't realize at the time that I came up
> with 9
> >>> fewer regions than I started. Upon further investigation, I found that
> >>> all
> >>> 9 missing regions were from splits that occurred while the cluster was
> >>> running after making the heap change and before it came to a halt.
> >>> was a 10th regions (5 splits involved in total) that managed to get
> >>> recovered. The really odd thing is that in the case of the other 9
> >>> regions, the original parent regions, which as far as I can tell in the
> >>> logs were deleted, were re-opened upon restarting things once again.
> >>> daughter regions were gone. Interestingly, I found the orphaned
> >>> datablocks
> >>> still intact, and in at least some cases have been able to extract the
> >>> data
> >>> from them and will hopefully re-add it to the tables.
> >>> My question is this. Does anyone know based on the rather muddled
> >>> description I've given above, what could have possibly happened here?
> >>> best guess is that the bad state that hdfs was in caused some critical
> >>> component of the split process to be missed, which resulted a reference
> >>> to
> >>> the parent regions sticking around and losing the references to the