Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> Table deletion got stuck


+
Lin XIAO 2012-11-27, 21:28
+
Keith Turner 2012-11-27, 21:38
+
Lin XIAO 2012-11-27, 21:42
+
Keith Turner 2012-11-27, 22:22
+
John Vines 2012-11-27, 22:24
+
Lin XIAO 2012-11-27, 23:20
+
Keith Turner 2012-11-28, 13:55
+
Lin XIAO 2012-11-28, 15:44
+
Keith Turner 2012-11-28, 15:55
+
Lin XIAO 2012-11-28, 16:07
+
Keith Turner 2012-11-28, 16:20
Copy link to this message
-
Re: Table deletion got stuck
I cannot find any initiateClose message ever since 10:38 on the
tserver. What can I do to test if a tserver hangs because someone
tries to deletes a table with a bad iterator?

I'll save the output of running jstack next time.

Thanks,
Lin

On Wed, Nov 28, 2012 at 11:20 AM, Keith Turner <[EMAIL PROTECTED]> wrote:
> I looked at the tablet server code to see what messages are logged when a
> tablet is unloaded.   When the unload request if received from the master,
> it throws a task on a thread pool to do the unload.  Not until this task
> runs will you actually see anything in the logs.
>
> When the task runs, I think one of the following may be executed... but not
> all... maybe none.
>
> log.info("told to unload tablet that was not being served " + extent);
>
> log.debug("initiateClose(saveState=" + saveState + " queueMinC=" + queueMinC
> + " disableWrites=" + disableWrites + ") " + getExtent());
>
>  log.debug("Failed to unload tablet " + extent + "... it was alread closing
> or closed : " + e.getMessage());
>
> log.error("Failed to close tablet " + extent + "... Aborting migration", e);
>
> If you are not seeing the initiateClose log message, one possibility is that
> another unload task was tying up the thread pool that processes unload.
> One common cause of this is someone deleting a table with a bad iterator.
>
> Keith
>
>
> On Wed, Nov 28, 2012 at 11:07 AM, Lin XIAO <[EMAIL PROTECTED]> wrote:
>>
>> No. I think there were about 5 minutes delayed on the server. I didn't
>> realize that ntp wasn't running on the server until seeing the
>> problems.
>>
>> On Wed, Nov 28, 2012 at 10:55 AM, Keith Turner <[EMAIL PROTECTED]> wrote:
>> > Are the times on the master and tablet server synched?  The load of n8<<
>> > on
>> > the tablet server seems to occur after delete is waiting for it.
>> >
>> > master.log : 27 11:48:04,332 [tableOps.CleanUp] DEBUG: Still waiting for
>> > table to be deleted: n8 locationState:
>> > n8<<@(null,10.0.0.10:41000[43b1b039a081368],null)
>> > tserver.log : 27 11:52:25,220 [tabletserver.TabletServer] INFO : Loading
>> > tablet n8<<
>> >
>> >
>> > On Wed, Nov 28, 2012 at 10:44 AM, Lin XIAO <[EMAIL PROTECTED]> wrote:
>> >>
>> >> n8 was an empty table created through the shell.  Here are the logs on
>> >> machine 10.0.0.10
>> >>
>> >> 27 11:52:25,220 [tabletserver.TabletServer] INFO : Loading tablet n8<<
>> >> 27 11:52:25,221 [tabletserver.TabletServer] INFO :
>> >> cloud9/10.0.0.10:41000: got assignment from master: n8<<
>> >> 27 11:52:25,221 [tabletserver.TabletServer] DEBUG: Loading extent: n8<<
>> >> 27 11:52:25,221 [tabletserver.TabletServer] DEBUG: verifying extent
>> >> n8<<
>> >> 27 11:52:25,223 [tabletserver.Tablet] DEBUG: Looking at metadata {n8<
>> >> future:43b1b039a081368 [] 423355 false=10.0.0.10:41000, n8< srv:dir []
>> >> 423354 false=/default_tablet, n8< srv:lock [] 423354
>> >> false=masters/lock/zlock-0000000184$43b1b039a08ad85, n8< srv:time []
>> >> 423354 false=M0, n8< ~tab:~pr [] 423354 false=}
>> >> 27 11:52:25,223 [tabletserver.Tablet] DEBUG: got [] for logs for n8<<
>> >> 27 11:52:25,230 [tabletserver.Tablet] TABLET_HIST: n8<< opened
>> >>
>> >> Thanks,
>> >> Lin
>> >>
>> >> On Wed, Nov 28, 2012 at 8:55 AM, Keith Turner <[EMAIL PROTECTED]> wrote:
>> >> > Can you look at the logs for tablet server 10.0.0.10 and see what was
>> >> > going
>> >> > on with tablet n8<<?
>> >> >
>> >> > Keith
>> >> >
>> >> >
>> >> > On Tue, Nov 27, 2012 at 6:20 PM, Lin XIAO <[EMAIL PROTECTED]>
>> >> > wrote:
>> >> >>
>> >> >> I've only went through the master log generated today for FAILED
>> >> >> transactions.
>> >> >> CreateTable operations failed because the table already exist while
>> >> >> the DeleteTable failed because the table doesn't exist. I think the
>> >> >> user run his hadoop jobs several times with same table names. If the
>> >> >> table cannot be deleted, the following create operations will fail.
>> >> >> I'm not sure why he tried to delete an non-existed table though.
>> >> >>
+
Keith Turner 2012-11-29, 18:16
+
Keith Turner 2012-11-28, 16:08
+
Lin XIAO 2012-11-28, 19:05
+
Keith Turner 2012-11-28, 16:56