Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Uneven distribute of Hosted Tablets?


Copy link to this message
-
Re: Uneven distribute of Hosted Tablets?
Hmm.  Anything on the one that reported assignment failed?

Billie
On Fri, May 31, 2013 at 9:53 AM, Ott, Charles H. <[EMAIL PROTECTED]>wrote:

> 2013-05-31 09:49:53,471 [tabletserver.TabletServer] DEBUG: Got
> unloadTablet message from user: !SYSTEM****
>
> 2013-05-31 09:49:53,471 [tabletserver.Tablet] DEBUG:
> initiateClose(saveState=true queueMinC=false disableWrites=false) !0;!0<<*
> ***
>
> 2013-05-31 09:49:53,471 [tabletserver.TabletServer] DEBUG: Failed to
> unload tablet !0;!0<<... it was alread closing or closed : Tablet !0;!0<<
> already closing****
>
> ** **
>
> The timestamp is 12 minutes off, since the clocks are out of sync,  but
> there seems to be the same number of debug statements above as there were
> errors in the master.****
>
> ** **
>
> *From:* [EMAIL PROTECTED][mailto:
> [EMAIL PROTECTED]] *On Behalf
> Of *Billie Rinaldi
> *Sent:* Friday, May 31, 2013 12:47 PM
>
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: Uneven distribute of Hosted Tablets?****
>
> ** **
>
> Can you go to one of those servers that is reporting unload / assignment
> failed and check its tserver log to see why it failed?
>
> Billie****
>
> ** **
>
> On Fri, May 31, 2013 at 9:39 AM, Ott, Charles H. <[EMAIL PROTECTED]>
> wrote:****
>
> I am not sure if I am using one of the balancers that comes with
> Accumulo.  There are some errors in my logs for the master since I did the
> clean shutdown/startup this morning:****
>
>  ****
>
> 2013-05-31 09:37:57,592 [master.Master] ERROR: 10.35.56.92:9997 reports
> unload failed for tablet !0;!0<< (A lot of these errors showed up)****
>
>  ****
>
> 2013-05-31 09:37:57,795 [master.Master] ERROR: 10.35.58.81:9997 reports
> assignment failed for tablet !0;!0<< (only one of these)****
>
>  ****
>
> 2013-05-31 09:37:05,784 [master.Master] ERROR: master:
> 1620-accumulo.dhcp.saic.com 10.35.56.92:9997 reports unload failed for
> tablet !0;!0<< (a lot of these)****
>
>  ****
>
> The entire batch of errors all occurred within 1 minute.  Then they don’t
> occur anymore.****
>
>  ****
>
>  ****
>
>  ****
>
> *From:* [EMAIL PROTECTED][mailto:
> [EMAIL PROTECTED]] *On Behalf
> Of *Billie Rinaldi
> *Sent:* Friday, May 31, 2013 12:14 PM****
>
>
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: Uneven distribute of Hosted Tablets?****
>
>  ****
>
> So (at the risk of stating the obvious) it seems like your cluster is in a
> funny state.  I would expect the counts in the "Hosted Tablets" column to
> all be roughly the same, especially after restarting the master, assuming
> you're using one of the balancers that comes with Accumulo.  It's possible
> the cluster has gotten into this state due to the clock differences.
> Accumulo has a mechanism called "logical time" to deal with clock
> differences, but it is not enabled by default.  You can enable it when you
> create a table.  If you don't enable this it is recommended that you use
> NTP to synchronize the clocks on your cluster.  The !METADATA table has
> logical time by default, but your other tables might not contain what you
> expect them to if you haven't enabled logical time.****
>
> That said, I'm not sure why the clock issue would be affecting the
> balancing.  You mentioned the new warnings you saw on the monitor page
> after you restarted the system.  Could you see if there are any older
> errors in your log files?
>
> Billie****
>
>  ****
>
> On Fri, May 31, 2013 at 8:10 AM, Ott, Charles H. <[EMAIL PROTECTED]>
> wrote:****
>
> -bash-4.1$ ssh 1620-accumulo****
>
> -bash-4.1$ date****
>
> Fri May 31 *10:52:49 *EDT 2013****
>
>  ****
>
> -bash-4.1$ ssh 1620-Node1****
>
> -bash-4.1$ date****
>
> Fri May 31 *11:05:48* EDT 2013****
>
>  ****
>
> -bash-4.1$ ssh 1620-Node2****
>
> -bash-4.1$ date****
>
> Fri May 31 *11:05:58* EDT 2013****
>
>  ****
>
> -bash-4.1$ ssh 1620-Node3****