Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Bloom filter thread failure errors


Copy link to this message
-
Re: Bloom filter thread failure errors
On Wed, Dec 4, 2013 at 7:29 PM, Terry P. <[EMAIL PROTECTED]> wrote:

> Hi Eric,
> Thanks for your reply, I'm just now getting back to this as I had more of
> these the past two days. No tserver failures or master halts. With previous
> errors we were still experiencing network issues that were indeed taking
> tabletservers down, but now that they fixed a bad line card in a switch
> that had been rebooting itself (but not failing over), those issues are all
> gone (finally, knock on wood).
>
> Now that I see them again in isolation with no other errors, in the main
> tserver log these bloom-loader thread failures appear to happen out of the
> blue with no other issues surrounding them.
>
> However, I just checked the debug log and see they are occurring right at
> the time of a Major Compaction.  E.g. from one of the tservers debug log:
>
> 2013-12-03 11:48:14,738 [tabletserver.Tablet] DEBUG: MajC initiate lock
> 0.00 secs, wait 0.00 secs
> 2013-12-03 11:48:14,739 [tabletserver.Tablet] DEBUG: Starting MajC 2;f;d
> (NORMAL) [/t-0000aa9/C0000zmf.rf, <several more rfiles listed> ] -->
> [/t-0000aa9/C0000zn4.rf_tmp
> 2013-12-03 11:48:14,780 [file.BloomFilterLayer] ERROR: Thread
> "bloom-loader-41" died File /accumulo/tables/2/t-0000aa9/C0000zmf.rf is
> closed
>
> The rest of the stack looks like what I posted earlier. The very next
> debug log message after the bloom loader exception is shows that the
> Compaction completed successfully in 0.112 seconds.
>
> So it looks like the bloom loader is trying to open an rfile 41ms after a
> compaction started, and the file was likely just compacted during that gap
> between the calls. If that's the case, can this error be safely ignored?
>

Its probably safe to ignore.   Bloomfilters are loaded lazily by a
background thread and its possible the file will be closed by the time the
background thread gets around to loading it.  However it should log a debug
in this case, so I am curious why an ERROR is logged.  Is there a stack
trace associated with the message 'Thread "bloom-loader-41" ...' ?
>
> Thanks,
> Terry
>
>
>
> On Mon, Nov 18, 2013 at 8:56 PM, Eric Newton <[EMAIL PROTECTED]>wrote:
>
>> This is an educated guess...
>>
>> When a process dies "gracefully" there's a shutdown hook that closes the
>> FileSystem.  That can result in messages like this.  It's likely there's an
>> error before this about a zookeeper session being lost, or a halt issued by
>> the master.  See if this tserver died shortly after this message. If so,
>> ignore the message.
>>
>> -Eric
>>
>>
>>
>> On Fri, Nov 15, 2013 at 4:31 PM, Terry P. <[EMAIL PROTECTED]> wrote:
>>
>>> Greetings folks,
>>> In my Accumulo 1.4.2 cluster I am seeing ERRORS about bloom loader
>>> threads dying due to an rfile being closed.  I can't copy/paste the error
>>> as it's on an air-gapped system, but it starts with:
>>>
>>> ERROR Thread "bloom-loader-2147" died File
>>> /accumulo/tables/2/t-0000aa4/F0000q3g.rf is closed
>>>   java.lang.IllegalStateException: File
>>> /accumulo/tables/2/t-0000aa4/F0000q3g.rf is closed
>>>     at
>>> org.apache.accumulo.core.file.blockfile.impl.CacheableBlockFile$Reader.getBCFile(CacheableBlockFile.java:244)
>>>     at
>>> org.apache.accumulo.core.file.blockfile.impl.CacheableBlockFile$Reader.access$000(CacheableBlockFile.java:142)
>>> (10 more java files ... ends with java.lang.Thread.run(UnknownSource) )
>>>
>>> No real rhyme or reason as to when they occur; we are predominantly
>>> ingest heavy with light reads by rowkey with ~10 entries per rowkey.  I
>>> don't really know if client programs are getting errors when these occur or
>>> not.
>>>
>>> I didn't find any JIRAs related to these.  Should I be concerned about
>>> these?
>>>
>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB