Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Table deletion got stuck


Copy link to this message
-
Re: Table deletion got stuck
On Wed, Nov 28, 2012 at 2:22 PM, Lin XIAO <[EMAIL PROTECTED]> wrote:

> I cannot find any initiateClose message ever since 10:38 on the
> tserver. What can I do to test if a tserver hangs because someone
> tries to deletes a table with a bad iterator?
>

I think you should see an error logged
by org.apache.accumulo.server.tabletserver.Compactor in the tablet server
logs if an iterator throws an exception.
>
> I'll save the output of running jstack next time.
>
> Thanks,
> Lin
>
> On Wed, Nov 28, 2012 at 11:20 AM, Keith Turner <[EMAIL PROTECTED]> wrote:
> > I looked at the tablet server code to see what messages are logged when a
> > tablet is unloaded.   When the unload request if received from the
> master,
> > it throws a task on a thread pool to do the unload.  Not until this task
> > runs will you actually see anything in the logs.
> >
> > When the task runs, I think one of the following may be executed... but
> not
> > all... maybe none.
> >
> > log.info("told to unload tablet that was not being served " + extent);
> >
> > log.debug("initiateClose(saveState=" + saveState + " queueMinC=" +
> queueMinC
> > + " disableWrites=" + disableWrites + ") " + getExtent());
> >
> >  log.debug("Failed to unload tablet " + extent + "... it was alread
> closing
> > or closed : " + e.getMessage());
> >
> > log.error("Failed to close tablet " + extent + "... Aborting migration",
> e);
> >
> > If you are not seeing the initiateClose log message, one possibility is
> that
> > another unload task was tying up the thread pool that processes unload.
> > One common cause of this is someone deleting a table with a bad iterator.
> >
> > Keith
> >
> >
> > On Wed, Nov 28, 2012 at 11:07 AM, Lin XIAO <[EMAIL PROTECTED]> wrote:
> >>
> >> No. I think there were about 5 minutes delayed on the server. I didn't
> >> realize that ntp wasn't running on the server until seeing the
> >> problems.
> >>
> >> On Wed, Nov 28, 2012 at 10:55 AM, Keith Turner <[EMAIL PROTECTED]>
> wrote:
> >> > Are the times on the master and tablet server synched?  The load of
> n8<<
> >> > on
> >> > the tablet server seems to occur after delete is waiting for it.
> >> >
> >> > master.log : 27 11:48:04,332 [tableOps.CleanUp] DEBUG: Still waiting
> for
> >> > table to be deleted: n8 locationState:
> >> > n8<<@(null,10.0.0.10:41000[43b1b039a081368],null)
> >> > tserver.log : 27 11:52:25,220 [tabletserver.TabletServer] INFO :
> Loading
> >> > tablet n8<<
> >> >
> >> >
> >> > On Wed, Nov 28, 2012 at 10:44 AM, Lin XIAO <[EMAIL PROTECTED]>
> wrote:
> >> >>
> >> >> n8 was an empty table created through the shell.  Here are the logs
> on
> >> >> machine 10.0.0.10
> >> >>
> >> >> 27 11:52:25,220 [tabletserver.TabletServer] INFO : Loading tablet
> n8<<
> >> >> 27 11:52:25,221 [tabletserver.TabletServer] INFO :
> >> >> cloud9/10.0.0.10:41000: got assignment from master: n8<<
> >> >> 27 11:52:25,221 [tabletserver.TabletServer] DEBUG: Loading extent:
> n8<<
> >> >> 27 11:52:25,221 [tabletserver.TabletServer] DEBUG: verifying extent
> >> >> n8<<
> >> >> 27 11:52:25,223 [tabletserver.Tablet] DEBUG: Looking at metadata {n8<
> >> >> future:43b1b039a081368 [] 423355 false=10.0.0.10:41000, n8< srv:dir
> []
> >> >> 423354 false=/default_tablet, n8< srv:lock [] 423354
> >> >> false=masters/lock/zlock-0000000184$43b1b039a08ad85, n8< srv:time []
> >> >> 423354 false=M0, n8< ~tab:~pr [] 423354 false=}
> >> >> 27 11:52:25,223 [tabletserver.Tablet] DEBUG: got [] for logs for n8<<
> >> >> 27 11:52:25,230 [tabletserver.Tablet] TABLET_HIST: n8<< opened
> >> >>
> >> >> Thanks,
> >> >> Lin
> >> >>
> >> >> On Wed, Nov 28, 2012 at 8:55 AM, Keith Turner <[EMAIL PROTECTED]>
> wrote:
> >> >> > Can you look at the logs for tablet server 10.0.0.10 and see what
> was
> >> >> > going
> >> >> > on with tablet n8<<?
> >> >> >
> >> >> > Keith
> >> >> >
> >> >> >
> >> >> > On Tue, Nov 27, 2012 at 6:20 PM, Lin XIAO <[EMAIL PROTECTED]>
> >> >> > wrote:
> >> >> >>
> >> >> >> I've only went through the master log generated today for FAILED