Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - Bloom filter thread failure errors


Copy link to this message
-
Re: Bloom filter thread failure errors
Keith Turner 2013-12-05, 15:50
On Wed, Dec 4, 2013 at 7:29 PM, Terry P. <[EMAIL PROTECTED]> wrote:

> Hi Eric,
> Thanks for your reply, I'm just now getting back to this as I had more of
> these the past two days. No tserver failures or master halts. With previous
> errors we were still experiencing network issues that were indeed taking
> tabletservers down, but now that they fixed a bad line card in a switch
> that had been rebooting itself (but not failing over), those issues are all
> gone (finally, knock on wood).
>
> Now that I see them again in isolation with no other errors, in the main
> tserver log these bloom-loader thread failures appear to happen out of the
> blue with no other issues surrounding them.
>
> However, I just checked the debug log and see they are occurring right at
> the time of a Major Compaction.  E.g. from one of the tservers debug log:
>
> 2013-12-03 11:48:14,738 [tabletserver.Tablet] DEBUG: MajC initiate lock
> 0.00 secs, wait 0.00 secs
> 2013-12-03 11:48:14,739 [tabletserver.Tablet] DEBUG: Starting MajC 2;f;d
> (NORMAL) [/t-0000aa9/C0000zmf.rf, <several more rfiles listed> ] -->
> [/t-0000aa9/C0000zn4.rf_tmp
> 2013-12-03 11:48:14,780 [file.BloomFilterLayer] ERROR: Thread
> "bloom-loader-41" died File /accumulo/tables/2/t-0000aa9/C0000zmf.rf is
> closed
>
> The rest of the stack looks like what I posted earlier. The very next
> debug log message after the bloom loader exception is shows that the
> Compaction completed successfully in 0.112 seconds.
>
> So it looks like the bloom loader is trying to open an rfile 41ms after a
> compaction started, and the file was likely just compacted during that gap
> between the calls. If that's the case, can this error be safely ignored?
>

Its probably safe to ignore.   Bloomfilters are loaded lazily by a
background thread and its possible the file will be closed by the time the
background thread gets around to loading it.  However it should log a debug
in this case, so I am curious why an ERROR is logged.  Is there a stack
trace associated with the message 'Thread "bloom-loader-41" ...' ?
>
> Thanks,
> Terry
>
>
>
> On Mon, Nov 18, 2013 at 8:56 PM, Eric Newton <[EMAIL PROTECTED]>wrote:
>
>> This is an educated guess...
>>
>> When a process dies "gracefully" there's a shutdown hook that closes the
>> FileSystem.  That can result in messages like this.  It's likely there's an
>> error before this about a zookeeper session being lost, or a halt issued by
>> the master.  See if this tserver died shortly after this message. If so,
>> ignore the message.
>>
>> -Eric
>>
>>
>>
>> On Fri, Nov 15, 2013 at 4:31 PM, Terry P. <[EMAIL PROTECTED]> wrote:
>>
>>> Greetings folks,
>>> In my Accumulo 1.4.2 cluster I am seeing ERRORS about bloom loader
>>> threads dying due to an rfile being closed.  I can't copy/paste the error
>>> as it's on an air-gapped system, but it starts with:
>>>
>>> ERROR Thread "bloom-loader-2147" died File
>>> /accumulo/tables/2/t-0000aa4/F0000q3g.rf is closed
>>>   java.lang.IllegalStateException: File
>>> /accumulo/tables/2/t-0000aa4/F0000q3g.rf is closed
>>>     at
>>> org.apache.accumulo.core.file.blockfile.impl.CacheableBlockFile$Reader.getBCFile(CacheableBlockFile.java:244)
>>>     at
>>> org.apache.accumulo.core.file.blockfile.impl.CacheableBlockFile$Reader.access$000(CacheableBlockFile.java:142)
>>> (10 more java files ... ends with java.lang.Thread.run(UnknownSource) )
>>>
>>> No real rhyme or reason as to when they occur; we are predominantly
>>> ingest heavy with light reads by rowkey with ~10 entries per rowkey.  I
>>> don't really know if client programs are getting errors when these occur or
>>> not.
>>>
>>> I didn't find any JIRAs related to these.  Should I be concerned about
>>> these?
>>>
>>
>>
>