Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Table deletion got stuck


Copy link to this message
-
Re: Table deletion got stuck
On Wed, Nov 28, 2012 at 2:22 PM, Lin XIAO <[EMAIL PROTECTED]> wrote:

> I cannot find any initiateClose message ever since 10:38 on the
> tserver. What can I do to test if a tserver hangs because someone
> tries to deletes a table with a bad iterator?
>

I think you should see an error logged
by org.apache.accumulo.server.tabletserver.Compactor in the tablet server
logs if an iterator throws an exception.
>
> I'll save the output of running jstack next time.
>
> Thanks,
> Lin
>
> On Wed, Nov 28, 2012 at 11:20 AM, Keith Turner <[EMAIL PROTECTED]> wrote:
> > I looked at the tablet server code to see what messages are logged when a
> > tablet is unloaded.   When the unload request if received from the
> master,
> > it throws a task on a thread pool to do the unload.  Not until this task
> > runs will you actually see anything in the logs.
> >
> > When the task runs, I think one of the following may be executed... but
> not
> > all... maybe none.
> >
> > log.info("told to unload tablet that was not being served " + extent);
> >
> > log.debug("initiateClose(saveState=" + saveState + " queueMinC=" +
> queueMinC
> > + " disableWrites=" + disableWrites + ") " + getExtent());
> >
> >  log.debug("Failed to unload tablet " + extent + "... it was alread
> closing
> > or closed : " + e.getMessage());
> >
> > log.error("Failed to close tablet " + extent + "... Aborting migration",
> e);
> >
> > If you are not seeing the initiateClose log message, one possibility is
> that
> > another unload task was tying up the thread pool that processes unload.
> > One common cause of this is someone deleting a table with a bad iterator.
> >
> > Keith
> >
> >
> > On Wed, Nov 28, 2012 at 11:07 AM, Lin XIAO <[EMAIL PROTECTED]> wrote:
> >>
> >> No. I think there were about 5 minutes delayed on the server. I didn't
> >> realize that ntp wasn't running on the server until seeing the
> >> problems.
> >>
> >> On Wed, Nov 28, 2012 at 10:55 AM, Keith Turner <[EMAIL PROTECTED]>
> wrote:
> >> > Are the times on the master and tablet server synched?  The load of
> n8<<
> >> > on
> >> > the tablet server seems to occur after delete is waiting for it.
> >> >
> >> > master.log : 27 11:48:04,332 [tableOps.CleanUp] DEBUG: Still waiting
> for
> >> > table to be deleted: n8 locationState:
> >> > n8<<@(null,10.0.0.10:41000[43b1b039a081368],null)
> >> > tserver.log : 27 11:52:25,220 [tabletserver.TabletServer] INFO :
> Loading
> >> > tablet n8<<
> >> >
> >> >
> >> > On Wed, Nov 28, 2012 at 10:44 AM, Lin XIAO <[EMAIL PROTECTED]>
> wrote:
> >> >>
> >> >> n8 was an empty table created through the shell.  Here are the logs
> on
> >> >> machine 10.0.0.10
> >> >>
> >> >> 27 11:52:25,220 [tabletserver.TabletServer] INFO : Loading tablet
> n8<<
> >> >> 27 11:52:25,221 [tabletserver.TabletServer] INFO :
> >> >> cloud9/10.0.0.10:41000: got assignment from master: n8<<
> >> >> 27 11:52:25,221 [tabletserver.TabletServer] DEBUG: Loading extent:
> n8<<
> >> >> 27 11:52:25,221 [tabletserver.TabletServer] DEBUG: verifying extent
> >> >> n8<<
> >> >> 27 11:52:25,223 [tabletserver.Tablet] DEBUG: Looking at metadata {n8<
> >> >> future:43b1b039a081368 [] 423355 false=10.0.0.10:41000, n8< srv:dir
> []
> >> >> 423354 false=/default_tablet, n8< srv:lock [] 423354
> >> >> false=masters/lock/zlock-0000000184$43b1b039a08ad85, n8< srv:time []
> >> >> 423354 false=M0, n8< ~tab:~pr [] 423354 false=}
> >> >> 27 11:52:25,223 [tabletserver.Tablet] DEBUG: got [] for logs for n8<<
> >> >> 27 11:52:25,230 [tabletserver.Tablet] TABLET_HIST: n8<< opened
> >> >>
> >> >> Thanks,
> >> >> Lin
> >> >>
> >> >> On Wed, Nov 28, 2012 at 8:55 AM, Keith Turner <[EMAIL PROTECTED]>
> wrote:
> >> >> > Can you look at the logs for tablet server 10.0.0.10 and see what
> was
> >> >> > going
> >> >> > on with tablet n8<<?
> >> >> >
> >> >> > Keith
> >> >> >
> >> >> >
> >> >> > On Tue, Nov 27, 2012 at 6:20 PM, Lin XIAO <[EMAIL PROTECTED]>
> >> >> > wrote:
> >> >> >>
> >> >> >> I've only went through the master log generated today for FAILED
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB