Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Errors after major compaction


Copy link to this message
-
Re: Errors after major compaction
no. but I did run major compaction.
As I explained initially, I disabled the table so I could change its TTL,
then re-enabled it then ran major compaction so it would clean up the
expired data due to the TTL change.

-eran

On Wed, Jul 6, 2011 at 02:43, Ted Yu <[EMAIL PROTECTED]> wrote:

> Eran:
> You didn't run hbck during the enabling of gs_raw_events table, right ?
>
> I saw:
> 2011-06-29 16:43:50,395 DEBUG
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction (major)
> requested for
>
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> because User-triggered major compaction; priority=1, compaction queue
> size=1248
>
> The above might be related to:
> >> 2011-06-29 16:43:57,880 INFO
> org.apache.hadoop.hbase.
> master.AssignmentManager: Region has been
> PENDING_OPEN for too long, reassigning
>
> region=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
>
> Thanks
>
> On Tue, Jul 5, 2011 at 7:19 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > Eran:
> > I logged https://issues.apache.org/jira/browse/HBASE-4060 for you.
> >
> >
> > On Mon, Jul 4, 2011 at 2:30 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >
> >> Thanks for the understanding.
> >>
> >> Can you log a JIRA and put your ideas below in it ?
> >>
> >>
> >>
> >> On Jul 4, 2011, at 12:42 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
> >>
> >> > Thanks for the explanation Ted,
> >> >
> >> > I will try to apply HBASE-3789 and hope for the best but my
> >> understanding is
> >> > that it doesn't really solve the problem, it only reduces the
> >> probability of
> >> > it happening, at least in one particular scenario. I would hope for a
> >> more
> >> > robust solution.
> >> > My concern is that the region allocation process seems to rely too
> much
> >> on
> >> > timing considerations and doesn't seem to take enough measures to
> >> guarantee
> >> > conflicts do not occur. I understand that in a distributed
> environment,
> >> when
> >> > you don't get a timely response from a remote machine you can't know
> for
> >> > sure if it did or did not receive the request, however there are
> things
> >> that
> >> > can be done to mitigate this and reduce the conflict time
> significantly.
> >> For
> >> > example, when I run dbck it knows that some regions are multiply
> >> assigned,
> >> > the master could do the same and try to resolve the conflict. Another
> >> > approach would be to handle late responses, even if the response from
> >> the
> >> > remote machine arrives after it was assumed to be dead the master
> should
> >> > have enough information to know it had created a conflict by assigning
> >> the
> >> > region to another server. An even better solution, I think, is for the
> >> RS to
> >> > periodically test that it is indeed the rightful owner of every region
> >> it
> >> > holds and relinquish control over the region if it's not.
> >> > Obviously a state where two RSs hold the same region is pathological
> and
> >> can
> >> > lead to data loss, as demonstrated in my case. The system should be
> able
> >> to
> >> > actively protect itself against such a scenario. It probably doesn't
> >> need
> >> > saying but there is really nothing worse for a data storage system
> than
> >> data
> >> > loss.
> >> >
> >> > In my case the problem didn't happen in the initial phase but after
> >> > disabling and enabling a table with about 12K regions.
> >> >
> >> > -eran
> >> >
> >> >
> >> >
> >> > On Sun, Jul 3, 2011 at 23:49, Ted Yu <[EMAIL PROTECTED]> wrote:
> >> >
> >> >> Let me try to answer some of your questions.
> >> >> The two paragraphs below were written along my reasoning which is in
> >> >> reverse
> >> >> order of the actual call sequence.
> >> >>
> >> >> For #4 below, the log indicates that the following was executed:
> >> >> private void assign(final RegionState state, final boolean
> >> setOfflineInZK,
> >> >>     final boolean forceNewPlan) {
> >> >>   for (int i = 0; i < this.maximumAssignmentAttempts; i++) {