Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Errors after major compaction


Copy link to this message
-
Re: Errors after major compaction
Appreciate it, sorry I didn't get to it sooner. Had some crazy days :)

-eran

On Tue, Jul 5, 2011 at 17:19, Ted Yu <[EMAIL PROTECTED]> wrote:

> Eran:
> I logged https://issues.apache.org/jira/browse/HBASE-4060 for you.
>
> On Mon, Jul 4, 2011 at 2:30 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > Thanks for the understanding.
> >
> > Can you log a JIRA and put your ideas below in it ?
> >
> >
> >
> > On Jul 4, 2011, at 12:42 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
> >
> > > Thanks for the explanation Ted,
> > >
> > > I will try to apply HBASE-3789 and hope for the best but my
> understanding
> > is
> > > that it doesn't really solve the problem, it only reduces the
> probability
> > of
> > > it happening, at least in one particular scenario. I would hope for a
> > more
> > > robust solution.
> > > My concern is that the region allocation process seems to rely too much
> > on
> > > timing considerations and doesn't seem to take enough measures to
> > guarantee
> > > conflicts do not occur. I understand that in a distributed environment,
> > when
> > > you don't get a timely response from a remote machine you can't know
> for
> > > sure if it did or did not receive the request, however there are things
> > that
> > > can be done to mitigate this and reduce the conflict time
> significantly.
> > For
> > > example, when I run dbck it knows that some regions are multiply
> > assigned,
> > > the master could do the same and try to resolve the conflict. Another
> > > approach would be to handle late responses, even if the response from
> the
> > > remote machine arrives after it was assumed to be dead the master
> should
> > > have enough information to know it had created a conflict by assigning
> > the
> > > region to another server. An even better solution, I think, is for the
> RS
> > to
> > > periodically test that it is indeed the rightful owner of every region
> it
> > > holds and relinquish control over the region if it's not.
> > > Obviously a state where two RSs hold the same region is pathological
> and
> > can
> > > lead to data loss, as demonstrated in my case. The system should be
> able
> > to
> > > actively protect itself against such a scenario. It probably doesn't
> need
> > > saying but there is really nothing worse for a data storage system than
> > data
> > > loss.
> > >
> > > In my case the problem didn't happen in the initial phase but after
> > > disabling and enabling a table with about 12K regions.
> > >
> > > -eran
> > >
> > >
> > >
> > > On Sun, Jul 3, 2011 at 23:49, Ted Yu <[EMAIL PROTECTED]> wrote:
> > >
> > >> Let me try to answer some of your questions.
> > >> The two paragraphs below were written along my reasoning which is in
> > >> reverse
> > >> order of the actual call sequence.
> > >>
> > >> For #4 below, the log indicates that the following was executed:
> > >> private void assign(final RegionState state, final boolean
> > setOfflineInZK,
> > >>     final boolean forceNewPlan) {
> > >>   for (int i = 0; i < this.maximumAssignmentAttempts; i++) {
> > >>     if (setOfflineInZK && !*setOfflineInZooKeeper*(state)) return;
> > >>
> > >> The above was due to the timeout which you noted in #2 which would
> have
> > >> caused
> > >> TimeoutMonitor.chore() to run this code (line 1787)
> > >>
> > >>     for (Map.Entry<HRegionInfo, Boolean> e: assigns.entrySet()){
> > >>       assign(e.getKey(), false, e.getValue());
> > >>     }
> > >>
> > >> This means there is lack of coordination between
> > >> assignmentManager.TimeoutMonitor and OpenedRegionHandler
> > >>
> > >> The reason I mention HBASE-3789 is that it is marked as Incompatible
> > change
> > >> and is in TRUNK already.
> > >> The application of HBASE-3789 to 0.90 branch would change the behavior
> > >> (timing) of region assignment.
> > >>
> > >> I think it makes sense to evaluate the effect of HBASE-3789 in 0.90.4
> > >>
> > >> BTW were the incorrect region assignments observed for a table with
> > >> multiple
> > >> initial regions ?
> >
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB