Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Bloom filter thread failure errors


Copy link to this message
-
Re: Bloom filter thread failure errors
Hi Keith,
Here is the stack trace in the tserver DEBUG log for this most recent
exception. The exception section is the same as what's in the main tserver
log, but of course the MajC bits don't appear in the main log. This is
hand-typed, but I'm pretty sure it's right.

2013-12-03 11:48:14,738 [tabletserver.Tablet] DEBUG: MajC initiate lock
0.00 secs, wait 0.00 secs
2013-12-03 11:48:14,739 [tabletserver.Tablet] DEBUG: Starting MajC 2;f;d
(NORMAL) [/t-0000aa9/C0000zmf.rf, <several more rfiles listed> ] -->
[/t-0000aa9/C0000zn4.rf_tmp
2013-12-03 11:48:14,780 [file.BloomFilterLayer] ERROR: Thread
"bloom-loader-41" died File /accumulo/tables/2/t-0000aa9/C0000zmf.rf is
closed
java.lang.IllegalStateException: File
/accumulo/tables/2/t-0000aa9/C0000zmf.rf is closed
  at
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBCFile(CachableBlockFile.java:244)
  at
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.access$000(CachableBlockFile.java:142)
  at
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader$MetaBlockLoader.get(CachableBlockFile.java:211)
  at
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBlock(CachableBlockFile.java:307)
  at
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:357)
  at
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:142)
  at
org.apache.accumulo.core.file.rfile.RFile$Reader.getMetaStore(Rfile.java:927)
  at
org.apache.accumulo.core.file.BloomFilterLayer$BloomFilterLoader$1.run(BloomFilterLayer.java:210)
  at
org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
  at java.lang.Thread.run(Unknown Source)
2013-12-03 11:48:14,900 [tabletserver.Compactor] DEBUG: Compaction 2;f;d
280 read | 280 written |  2,500 entries/sec | 0.112 secs
2013-12-03 11:48:14,924 [tabletserver.Tablet] DEBUG: MajC finish lock 0.00
secs

The biggest bummer here is just that it appears on the Monitor GUI as an
error, and we all know how Operators don't like "errors" on their screens
;-)  But if this is one that can be safely ignored, we'll just have to
write that up in a procedure somewhere.

On Thu, Dec 5, 2013 at 9:50 AM, Keith Turner <[EMAIL PROTECTED]> wrote:

>
>
>
> On Wed, Dec 4, 2013 at 7:29 PM, Terry P. <[EMAIL PROTECTED]> wrote:
>
>> Hi Eric,
>> Thanks for your reply, I'm just now getting back to this as I had more of
>> these the past two days. No tserver failures or master halts. With previous
>> errors we were still experiencing network issues that were indeed taking
>> tabletservers down, but now that they fixed a bad line card in a switch
>> that had been rebooting itself (but not failing over), those issues are all
>> gone (finally, knock on wood).
>>
>> Now that I see them again in isolation with no other errors, in the main
>> tserver log these bloom-loader thread failures appear to happen out of the
>> blue with no other issues surrounding them.
>>
>> However, I just checked the debug log and see they are occurring right at
>> the time of a Major Compaction.  E.g. from one of the tservers debug log:
>>
>> 2013-12-03 11:48:14,738 [tabletserver.Tablet] DEBUG: MajC initiate lock
>> 0.00 secs, wait 0.00 secs
>> 2013-12-03 11:48:14,739 [tabletserver.Tablet] DEBUG: Starting MajC 2;f;d
>> (NORMAL) [/t-0000aa9/C0000zmf.rf, <several more rfiles listed> ] -->
>> [/t-0000aa9/C0000zn4.rf_tmp
>> 2013-12-03 11:48:14,780 [file.BloomFilterLayer] ERROR: Thread
>> "bloom-loader-41" died File /accumulo/tables/2/t-0000aa9/C0000zmf.rf is
>> closed
>>
>> The rest of the stack looks like what I posted earlier. The very next
>> debug log message after the bloom loader exception is shows that the
>> Compaction completed successfully in 0.112 seconds.
>>
>> So it looks like the bloom loader is trying to open an rfile 41ms after a
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB