Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo, mail # user - Bloom filter thread failure errors


+
Terry P. 2013-11-15, 21:31
+
Eric Newton 2013-11-19, 02:56
+
Terry P. 2013-12-05, 00:29
+
Keith Turner 2013-12-05, 15:50
+
Terry P. 2013-12-05, 20:37
Copy link to this message
-
Re: Bloom filter thread failure errors
Keith Turner 2013-12-05, 21:21
On Thu, Dec 5, 2013 at 3:37 PM, Terry P. <[EMAIL PROTECTED]> wrote:

> Hi Keith,
> Here is the stack trace in the tserver DEBUG log for this most recent
> exception. The exception section is the same as what's in the main tserver
> log, but of course the MajC bits don't appear in the main log. This is
> hand-typed, but I'm pretty sure it's right.
>
>
> 2013-12-03 11:48:14,738 [tabletserver.Tablet] DEBUG: MajC initiate lock
> 0.00 secs, wait 0.00 secs
> 2013-12-03 11:48:14,739 [tabletserver.Tablet] DEBUG: Starting MajC 2;f;d
> (NORMAL) [/t-0000aa9/C0000zmf.rf, <several more rfiles listed> ] -->
> [/t-0000aa9/C0000zn4.rf_tmp
> 2013-12-03 11:48:14,780 [file.BloomFilterLayer] ERROR: Thread
> "bloom-loader-41" died File /accumulo/tables/2/t-0000aa9/C0000zmf.rf is
> closed
> java.lang.IllegalStateException: File
> /accumulo/tables/2/t-0000aa9/C0000zmf.rf is closed
>   at
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBCFile(CachableBlockFile.java:244)
>   at
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.access$000(CachableBlockFile.java:142)
>   at
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader$MetaBlockLoader.get(CachableBlockFile.java:211)
>   at
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBlock(CachableBlockFile.java:307)
>   at
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:357)
>   at
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:142)
>   at
> org.apache.accumulo.core.file.rfile.RFile$Reader.getMetaStore(Rfile.java:927)
>   at
> org.apache.accumulo.core.file.BloomFilterLayer$BloomFilterLoader$1.run(BloomFilterLayer.java:210)
>   at
> org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>   at java.lang.Thread.run(Unknown Source)
> 2013-12-03 11:48:14,900 [tabletserver.Compactor] DEBUG: Compaction 2;f;d
> 280 read | 280 written |  2,500 entries/sec | 0.112 secs
> 2013-12-03 11:48:14,924 [tabletserver.Tablet] DEBUG: MajC finish lock 0.00
> secs
>
> The biggest bummer here is just that it appears on the Monitor GUI as an
> error, and we all know how Operators don't like "errors" on their screens
> ;-)  But if this is one that can be safely ignored, we'll just have to
> write that up in a procedure somewhere.
>

The code in BloomFilterLayer$BloomFilterLoader$1.run() logs IOExceptions at
debug when the file is closed.  Because this is an IllegalStateException
its not being ignored.   Would you like to open a bug for this?
>
>
>
> On Thu, Dec 5, 2013 at 9:50 AM, Keith Turner <[EMAIL PROTECTED]> wrote:
>
>>
>>
>>
>> On Wed, Dec 4, 2013 at 7:29 PM, Terry P. <[EMAIL PROTECTED]> wrote:
>>
>>> Hi Eric,
>>> Thanks for your reply, I'm just now getting back to this as I had more
>>> of these the past two days. No tserver failures or master halts. With
>>> previous errors we were still experiencing network issues that were indeed
>>> taking tabletservers down, but now that they fixed a bad line card in a
>>> switch that had been rebooting itself (but not failing over), those issues
>>> are all gone (finally, knock on wood).
>>>
>>> Now that I see them again in isolation with no other errors, in the main
>>> tserver log these bloom-loader thread failures appear to happen out of the
>>> blue with no other issues surrounding them.
>>>
>>> However, I just checked the debug log and see they are occurring right
>>> at the time of a Major Compaction.  E.g. from one of the tservers debug log:
>>>
>>> 2013-12-03 11:48:14,738 [tabletserver.Tablet] DEBUG: MajC initiate lock
>>> 0.00 secs, wait 0.00 secs
>>> 2013-12-03 11:48:14,739 [tabletserver.Tablet] DEBUG: Starting MajC 2;f;d
>>> (NORMAL) [/t-0000aa9/C0000zmf.rf, <several more rfiles listed> ] -->