Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Wondering what hbck should do in this situation


Copy link to this message
-
Re: Wondering what hbck should do in this situation
We actually ran into something similar on an upgrade from hbase 0.90 to an
hbase 0.92 --  a few regions would bounce around between regionservers
failing after going into FAILED_OPEN rit state.

Here were the repair cases we considered:
1) What do you do if the parent file is not present?  Sideline the
reference files.  Bulk load and data files.  Without the original file we
cannot really  save anything.  If the parent is not present, it may have
been moved, but its data is still present.
2) What do you do if the parent file is present?  I think you can sideline
the reference files.  The original file is present somewhere in hdfs so
that means the data is not lost.

Another related idea is to have a quarantine directory for regions/files
that are repeatedly ill-behaved.  For example, if we tried to read a
reference file multiple times and failed, quarantine the file and try
again.  We had another case -- we ran into a truncated hfile and the same
strategy would have gotten the cluster working (and still has the
posibility of data recovery)

Jon.

On Wed, Jul 18, 2012 at 9:56 PM, Ramkrishna.S.Vasudevan <
[EMAIL PROTECTED]> wrote:

> J-d
> Corrections, if META does not have an entry then we cannot know if it is
> splitted or not.. Apologies for that.
>
> I think we need to check for Reference files and if the opening fails we
> need to report it.  That should be the way.
> But we should also confirm whether this region was split properly, right?
>
> Regards
> Ram
>
> > -----Original Message-----
> > From: Ramkrishna.S.Vasudevan [mailto:[EMAIL PROTECTED]]
> > Sent: Thursday, July 19, 2012 10:21 AM
> > To: '[EMAIL PROTECTED]'
> > Subject: RE: Wondering what hbck should do in this situation
> >
> > J-D
> >
> > Just going thro the explanation I feel that the region that had
> > references is a parent region and it should have an entry in META
> > saying it is SPLIT and OFFLINE?
> >
> > May be while fixing those cases where we find something in HDFS and not
> > in META we may need see if it is splitted?
> >
> > Was there any reason why the CatalogJanitor was not able to pick this
> > region for clean up.
> >
> > I may be wrong here JD, just going thro the explanation am thinking
> > this could be the scenario.
> >
> > Thanks for bringing this up, would add this to our internal testing
> > also.
> >
> > Regards
> > Ram
> >
> >
> > > -----Original Message-----
> > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of
> > Jean-
> > > Daniel Cryans
> > > Sent: Wednesday, July 18, 2012 9:23 PM
> > > To: [EMAIL PROTECTED]
> > > Subject: Wondering what hbck should do in this situation
> > >
> > > Hey devs,
> > >
> > > I encountered an "interesting" situation with hbck in 0.94, we had
> > > this region which was on HDFS that wasn't in .META. and hbck decided
> > > to include it back:
> > >
> > > ERROR: Region { meta => null, hdfs =>
> > > hdfs://sfor3s24:10101/hbase/url_stumble_summary/159952764, deployed
> > =>
> > >  } on HDFS, but not listed in META or deployed on any region server
> > > 12/07/17 23:46:03 INFO util.HBaseFsck: Patching .META. with
> > > .regioninfo: {NAME =>
> > > 'url_stumble_summary,25467315:2009-12-28,1271922074820', STARTKEY =>
> > > '25467315:2009-12-28', ENDKEY => '25821137:2010-03-08', ENCODED =>
> > > 159952764,}
> > >
> > > Then when it tried to assign the region it got bounced between region
> > > servers:
> > >
> > > Trying to reassign region...
> > > 12/07/17 23:46:04 INFO util.HBaseFsckRepair: Region still in
> > > transition, waiting for it to become assigned: {NAME =>
> > > 'url_stumble_summary,25467315:2009-12-28,1271922074820', STARTKEY =>
> > > '25467315:2009-12-28', ENDKEY => '25821137:2010-03-08', ENCODED =>
> > > 159952764,}
> > > 12/07/17 23:46:05 INFO util.HBaseFsckRepair: Region still in
> > > transition, waiting for it to become assigned: {NAME =>
> > > 'url_stumble_summary,25467315:2009-12-28,1271922074820', STARTKEY =>
> > > '25467315:2009-12-28', ENDKEY => '25821137:2010-03-08', ENCODED =>
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// [EMAIL PROTECTED]