Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> one of our datanodes stops working after few hours


Copy link to this message
-
Re: one of our datanodes stops working after few hours
As requested:

http://pastebin.com/aySaTADp

Note, blocked threads.

-Jack

On Mon, May 2, 2011 at 2:39 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote:
> I think Todd was asking to have a jstack without yourkit, so it
> shouldn't be an issue for you :)
>
> J-D
>
> On Mon, May 2, 2011 at 1:56 PM, Jack Levin <[EMAIL PROTECTED]> wrote:
>> my yourkit version expired :)... but here is the jstack when it
>> happens: http://pastebin.com/5v6mHg3t
>>
>> On Mon, May 2, 2011 at 1:00 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
>>> On Mon, May 2, 2011 at 12:56 PM, Jack Levin <[EMAIL PROTECTED]> wrote:
>>>
>>>> Tried removing yourkit and run on javasun, same thing.  We have some
>>>> threads blocked, does anyone know what they block on?
>>>>
>>>
>>> Which threads are blocked? Can you get some jstacks without yourkit?
>>>
>>> -Todd
>>>
>>>
>>>>
>>>> -Jack
>>>>
>>>> On Mon, May 2, 2011 at 7:53 AM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
>>>> > Hi Jack,
>>>> >
>>>> > Does this happen even if you aren't running Yourkit on the DN?
>>>> >
>>>> > Can you try using a Sun JDK instead of OpenJDK?
>>>> >
>>>> > -Todd
>>>> >
>>>> > On Sun, May 1, 2011 at 7:34 PM, Jack Levin <[EMAIL PROTECTED]> wrote:
>>>> >
>>>> >> Version:         0.20.2+320 hdfs
>>>> >> .89 HBASE
>>>> >>
>>>> >> ulimit is 32k
>>>> >> xcievers is 5k
>>>> >>
>>>> >> Note from the jstack, I am not exceeding xcievers.
>>>> >>
>>>> >> -Jack
>>>> >>
>>>> >> On Sun, May 1, 2011 at 6:19 PM, Michael Segel <
>>>> [EMAIL PROTECTED]>
>>>> >> wrote:
>>>> >> >
>>>> >> >
>>>> >> > What's your xceivers set to?
>>>> >> > What's the ulimit -n  set for hdfs/hadoop user... (You didn't say
>>>> which
>>>> >> release/version you were using.)
>>>> >> >
>>>> >> >> Date: Sun, 1 May 2011 17:47:18 -0700
>>>> >> >> Subject: one of our datanodes stops working after few hours
>>>> >> >> From: [EMAIL PROTECTED]
>>>> >> >> To: [EMAIL PROTECTED]
>>>> >> >>
>>>> >> >> I took a jstack (http://pastebin.com/5v6mHg3t).   After few hours,
>>>> its
>>>> >> >> literally staggers to a halt and gets very very slow... Any ideas
>>>> >> >> whats its blocking on?
>>>> >> >> (main issue is that fsreads for RS get really slow when that
>>>> happens).
>>>> >> >>
>>>> >> >> -Jack
>>>> >> >
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Todd Lipcon
>>>> > Software Engineer, Cloudera
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>