|
|
-
Re: Never ending transtionning regions.Jean-Marc Spaggiari 2013-02-24, 14:43
Removing user.
What I did yesterday is: - Merged a table to have big regions - Altered the table to have those regions splitted. - Ran a major_compact - Stopped HBase before all of that end. I tried again yesterday evening but was not able to reproduce. I will try again today and keep the list posted. 2013/2/23 Kevin O'dell <[EMAIL PROTECTED]> > +Dev > > I think number 1 we fix what ever is leaving regions in this state. I > think we could put logic into hbck for this. > > On Sat, Feb 23, 2013 at 7:36 PM, Jean-Marc Spaggiari < > [EMAIL PROTECTED]> wrote: > > > Hi Kevin, > > > > I stopped HBase to merge some regions so I already had to deal with the > > downtime. But with the online merge coming it's very good to know the > > online way to do it. > > > > Now, is there an automated way to do it? In HBCK? Maybe we can check each > > region if there is links, check that those links exist, and if not, we > > remove them? Or it will be to risky? > > > > JM > > > > > > > > > > > > 2013/2/23 Kevin O'dell <[EMAIL PROTECTED]> > > > > > JM, > > > > > > Here is what I am seeing: > > > > > > 2013-02-23 15:46:14,630 ERROR > > > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed > > open > > > of > > > > > > > > > region=entry,ac.adanac-oidar.www\x1Fhttp\x1F-1\x1F/sports/patinage/2012/04/04/001-artistique-trophee-mondial.shtml\x1Fnull,1361651769136.6dd77bc9ff91e0e6d413f74e670ab435., > > > starting to roll back the global memstore size. > > > > > > If you checked 6dd77bc9ff91e0e6d413f74e670ab435 you should have seen > some > > > pointer files to 2ebfef593a3d715b59b85670909182c9. Typically, you > would > > > see the storefiles in 6dd77bc9ff91e0e6d413f74e670ab435 and > > > 2ebfef593a3d715b59b85670909182c9 > > > would have been empty from a bad split. What I do is to delete the > > > pointers that don't reference any storefiles. Then you can clear the > > > unassigned folder in zkCli. Finally, run an unassign on the RITs. > This > > > way there is no down time and you don't have to drop any tables. > > > > > > > > > On Sat, Feb 23, 2013 at 6:14 PM, Jean-Marc Spaggiari < > > > [EMAIL PROTECTED]> wrote: > > > > > > > Hi Kevin, > > > > > > > > Thanks for taking the time to reply. > > > > > > > > Here is a bigger extract of the logs. I don't see another path in the > > > logs. > > > > > > > > http://pastebin.com/uMxGyjKm > > > > > > > > I can send you the entire log if you want (42Mo) > > > > > > > > What I did is I merged many regions together, then altered the table > to > > > set > > > > the max_filesize and started a major_compaction to get the table > > > splitted. > > > > > > > > To fix the issue I had to drop one working table, and ran -repair > > > multiple > > > > times. Now it's fixed, but I still have the logs. > > > > > > > > I'm redoing all the steps I did. Many I will face the issue again. If > > I'm > > > > able to reproduce, we might be able to figure where the issue is... > > > > > > > > JM > > > > > > > > 2013/2/23 Kevin O'dell <[EMAIL PROTECTED]> > > > > > > > > > JM, > > > > > > > > > > How are you doing today? Right before the file does not exist > > should > > > > be > > > > > another path. Can you let me know if in that path there are a > > pointers > > > > > from a split to 2ebfef593a3d715b59b85670909182c9? The directory > may > > > > > already exist. I have seen this a couple times now and am trying > to > > > > ferret > > > > > out a root cause to open a JIRA with. I suspect we have a split > code > > > bug > > > > > in .92+ > > > > > > > > > > On Sat, Feb 23, 2013 at 4:10 PM, Jean-Marc Spaggiari < > > > > > [EMAIL PROTECTED]> wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > I have 2 regions transitionning from servers to servers for 15 > > > minutes > > > > > now. > > > > > > > > > > > > I have nothing in the master logs about those 2 regions but on > the > > > > region > > > > > > server logs I have some files notfound2013-02-23 16:02:07,347 |