Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Errors after major compaction


Copy link to this message
-
Re: Errors after major compaction
no. but I did run major compaction.
As I explained initially, I disabled the table so I could change its TTL,
then re-enabled it then ran major compaction so it would clean up the
expired data due to the TTL change.

-eran

On Wed, Jul 6, 2011 at 02:43, Ted Yu <[EMAIL PROTECTED]> wrote:

> Eran:
> You didn't run hbck during the enabling of gs_raw_events table, right ?
>
> I saw:
> 2011-06-29 16:43:50,395 DEBUG
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction (major)
> requested for
>
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> because User-triggered major compaction; priority=1, compaction queue
> size=1248
>
> The above might be related to:
> >> 2011-06-29 16:43:57,880 INFO
> org.apache.hadoop.hbase.
> master.AssignmentManager: Region has been
> PENDING_OPEN for too long, reassigning
>
> region=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
>
> Thanks
>
> On Tue, Jul 5, 2011 at 7:19 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > Eran:
> > I logged https://issues.apache.org/jira/browse/HBASE-4060 for you.
> >
> >
> > On Mon, Jul 4, 2011 at 2:30 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >
> >> Thanks for the understanding.
> >>
> >> Can you log a JIRA and put your ideas below in it ?
> >>
> >>
> >>
> >> On Jul 4, 2011, at 12:42 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
> >>
> >> > Thanks for the explanation Ted,
> >> >
> >> > I will try to apply HBASE-3789 and hope for the best but my
> >> understanding is
> >> > that it doesn't really solve the problem, it only reduces the
> >> probability of
> >> > it happening, at least in one particular scenario. I would hope for a
> >> more
> >> > robust solution.
> >> > My concern is that the region allocation process seems to rely too
> much
> >> on
> >> > timing considerations and doesn't seem to take enough measures to
> >> guarantee
> >> > conflicts do not occur. I understand that in a distributed
> environment,
> >> when
> >> > you don't get a timely response from a remote machine you can't know
> for
> >> > sure if it did or did not receive the request, however there are
> things
> >> that
> >> > can be done to mitigate this and reduce the conflict time
> significantly.
> >> For
> >> > example, when I run dbck it knows that some regions are multiply
> >> assigned,
> >> > the master could do the same and try to resolve the conflict. Another
> >> > approach would be to handle late responses, even if the response from
> >> the
> >> > remote machine arrives after it was assumed to be dead the master
> should
> >> > have enough information to know it had created a conflict by assigning
> >> the
> >> > region to another server. An even better solution, I think, is for the
> >> RS to
> >> > periodically test that it is indeed the rightful owner of every region
> >> it
> >> > holds and relinquish control over the region if it's not.
> >> > Obviously a state where two RSs hold the same region is pathological
> and
> >> can
> >> > lead to data loss, as demonstrated in my case. The system should be
> able
> >> to
> >> > actively protect itself against such a scenario. It probably doesn't
> >> need
> >> > saying but there is really nothing worse for a data storage system
> than
> >> data
> >> > loss.
> >> >
> >> > In my case the problem didn't happen in the initial phase but after
> >> > disabling and enabling a table with about 12K regions.
> >> >
> >> > -eran
> >> >
> >> >
> >> >
> >> > On Sun, Jul 3, 2011 at 23:49, Ted Yu <[EMAIL PROTECTED]> wrote:
> >> >
> >> >> Let me try to answer some of your questions.
> >> >> The two paragraphs below were written along my reasoning which is in
> >> >> reverse
> >> >> order of the actual call sequence.
> >> >>
> >> >> For #4 below, the log indicates that the following was executed:
> >> >> private void assign(final RegionState state, final boolean
> >> setOfflineInZK,
> >> >>     final boolean forceNewPlan) {
> >> >>   for (int i = 0; i < this.maximumAssignmentAttempts; i++) {
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB