Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Bloom filter thread failure errors


Copy link to this message
-
Re: Bloom filter thread failure errors
On Thu, Dec 5, 2013 at 3:37 PM, Terry P. <[EMAIL PROTECTED]> wrote:

> Hi Keith,
> Here is the stack trace in the tserver DEBUG log for this most recent
> exception. The exception section is the same as what's in the main tserver
> log, but of course the MajC bits don't appear in the main log. This is
> hand-typed, but I'm pretty sure it's right.
>
>
> 2013-12-03 11:48:14,738 [tabletserver.Tablet] DEBUG: MajC initiate lock
> 0.00 secs, wait 0.00 secs
> 2013-12-03 11:48:14,739 [tabletserver.Tablet] DEBUG: Starting MajC 2;f;d
> (NORMAL) [/t-0000aa9/C0000zmf.rf, <several more rfiles listed> ] -->
> [/t-0000aa9/C0000zn4.rf_tmp
> 2013-12-03 11:48:14,780 [file.BloomFilterLayer] ERROR: Thread
> "bloom-loader-41" died File /accumulo/tables/2/t-0000aa9/C0000zmf.rf is
> closed
> java.lang.IllegalStateException: File
> /accumulo/tables/2/t-0000aa9/C0000zmf.rf is closed
>   at
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBCFile(CachableBlockFile.java:244)
>   at
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.access$000(CachableBlockFile.java:142)
>   at
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader$MetaBlockLoader.get(CachableBlockFile.java:211)
>   at
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBlock(CachableBlockFile.java:307)
>   at
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:357)
>   at
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:142)
>   at
> org.apache.accumulo.core.file.rfile.RFile$Reader.getMetaStore(Rfile.java:927)
>   at
> org.apache.accumulo.core.file.BloomFilterLayer$BloomFilterLoader$1.run(BloomFilterLayer.java:210)
>   at
> org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>   at java.lang.Thread.run(Unknown Source)
> 2013-12-03 11:48:14,900 [tabletserver.Compactor] DEBUG: Compaction 2;f;d
> 280 read | 280 written |  2,500 entries/sec | 0.112 secs
> 2013-12-03 11:48:14,924 [tabletserver.Tablet] DEBUG: MajC finish lock 0.00
> secs
>
> The biggest bummer here is just that it appears on the Monitor GUI as an
> error, and we all know how Operators don't like "errors" on their screens
> ;-)  But if this is one that can be safely ignored, we'll just have to
> write that up in a procedure somewhere.
>

The code in BloomFilterLayer$BloomFilterLoader$1.run() logs IOExceptions at
debug when the file is closed.  Because this is an IllegalStateException
its not being ignored.   Would you like to open a bug for this?
>
>
>
> On Thu, Dec 5, 2013 at 9:50 AM, Keith Turner <[EMAIL PROTECTED]> wrote:
>
>>
>>
>>
>> On Wed, Dec 4, 2013 at 7:29 PM, Terry P. <[EMAIL PROTECTED]> wrote:
>>
>>> Hi Eric,
>>> Thanks for your reply, I'm just now getting back to this as I had more
>>> of these the past two days. No tserver failures or master halts. With
>>> previous errors we were still experiencing network issues that were indeed
>>> taking tabletservers down, but now that they fixed a bad line card in a
>>> switch that had been rebooting itself (but not failing over), those issues
>>> are all gone (finally, knock on wood).
>>>
>>> Now that I see them again in isolation with no other errors, in the main
>>> tserver log these bloom-loader thread failures appear to happen out of the
>>> blue with no other issues surrounding them.
>>>
>>> However, I just checked the debug log and see they are occurring right
>>> at the time of a Major Compaction.  E.g. from one of the tservers debug log:
>>>
>>> 2013-12-03 11:48:14,738 [tabletserver.Tablet] DEBUG: MajC initiate lock
>>> 0.00 secs, wait 0.00 secs
>>> 2013-12-03 11:48:14,739 [tabletserver.Tablet] DEBUG: Starting MajC 2;f;d
>>> (NORMAL) [/t-0000aa9/C0000zmf.rf, <several more rfiles listed> ] -->
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB