Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Lost regions question


Copy link to this message
-
Re: Lost regions question
Ted Yu 2013-04-12, 16:34
Brennon:
Can you try hbck to see if the problem is repaired ?

Thanks

On Fri, Apr 12, 2013 at 9:27 AM, ramkrishna vasudevan <
[EMAIL PROTECTED]> wrote:

> Oh..sorry to hear that .  But i think it should be there in the system but
> not allowing you to access.  We should be able to bring it back.
>
> One set of logs that would be of interest is that of the RS and master when
> the split happened.
>
> And the main thing would be that when you restarted your cluster and the
> Master again came back. That is where the system does some self
> rectification after it sees if there were some partial splits.
>
> Regards
> Ram
>
>
> On Fri, Apr 12, 2013 at 9:34 PM, Brennon Church <[EMAIL PROTECTED]>
> wrote:
>
> > Hello,
> >
> > We lost the data when the parent regions got reopened.  My guess, and
> it's
> > only that, is that the regions were  essentially empty when they started
> up
> > again in these cases.  We definitely lost data from the tables.
> >
> > I've looked through the hdfs and hbase logs and can't find any obvious
> > difference between a successful split and these failed ones.  All steps
> > show up the same in all cases.  After the handled split message that
> listed
> > the parent and daughter regions, the next reference is to the parent
> > regions once again as hbase is started back up after the failure.  No
> > further reference to the daughters is made.
> >
> > I couldn't cleanly shut several of the regionservers down, so they were
> > abruptly killed, yes.
> >
> > HBase version is 0.92.0, and hadoop is 1.0.1.
> >
> > Thanks.
> >
> > --Brennon
> >
> >
> > On 4/11/13 10:58 PM, ramkrishna vasudevan wrote:
> >
> >> When you say that the parent regions got reopened does that mean that
> you
> >> did not lose any data(any data could not be read).  The reason am asking
> >> is
> >> if after the parent got split into daughters and the data was written to
> >> daughters and if the daughters related files could not be opened you
> could
> >> have ended up in not able to read the data.
> >>
> >> Some logs could tell us what made the parent to get reopened rather than
> >> daughters.  Another thing i would like to ask is was the cluster brought
> >> down abruptly by killing the RS.
> >>
> >> Which version of HBase?
> >>
> >> Regards
> >> Ram
> >>
> >>
> >>
> >>
> >> On Fri, Apr 12, 2013 at 11:20 AM, Brennon Church <[EMAIL PROTECTED]>
> >> wrote:
> >>
> >>  Hello,
> >>>
> >>> I had an interesting problem come up recently.  We have a few thousand
> >>> regions across 8 datanode/regionservers.  I made a change, increasing
> the
> >>> heap size for hadoop from 128M to 2048M which ended up bringing the
> >>> cluster
> >>> to a complete halt after about 1 hour.  I reverted back to 128M and
> >>> turned
> >>> things back on again but didn't realize at the time that I came up
> with 9
> >>> fewer regions than I started.  Upon further investigation, I found that
> >>> all
> >>> 9 missing regions were from splits that occurred while the cluster was
> >>> running after making the heap change and before it came to a halt.
>  There
> >>> was a 10th regions (5 splits involved in total) that managed to get
> >>> recovered.  The really odd thing is that in the case of the other 9
> >>> regions, the original parent regions, which as far as I can tell in the
> >>> logs were deleted, were re-opened upon restarting things once again.
>  The
> >>> daughter regions were gone.  Interestingly, I found the orphaned
> >>> datablocks
> >>> still intact, and in at least some cases have been able to extract the
> >>> data
> >>> from them and will hopefully re-add it to the tables.
> >>>
> >>> My question is this.  Does anyone know based on the rather muddled
> >>> description I've given above, what could have possibly happened here?
>  My
> >>> best guess is that the bad state that hdfs was in caused some critical
> >>> component of the split process to be missed, which resulted a reference
> >>> to
> >>> the parent regions sticking around and losing the references to the