Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> one of our datanodes stops working after few hours


Copy link to this message
-
Re: one of our datanodes stops working after few hours
I will try, thanks.  I have not ran NFS since 1998 :).

-Jack

On Mon, May 2, 2011 at 10:10 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
> Hi Jack,
>
> Try turning off your clienttrace logs in the DN log4j.properties, perhaps?
>
> By any chance do you log to NFS?
>
> Your blocked threads all seem to be waiting on appends to log4j.
>
> -Todd
>
> On Mon, May 2, 2011 at 7:29 PM, Jack Levin <[EMAIL PROTECTED]> wrote:
>
>> As requested:
>>
>> http://pastebin.com/aySaTADp
>>
>> Note, blocked threads.
>>
>> -Jack
>>
>> On Mon, May 2, 2011 at 2:39 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>
>> wrote:
>> > I think Todd was asking to have a jstack without yourkit, so it
>> > shouldn't be an issue for you :)
>> >
>> > J-D
>> >
>> > On Mon, May 2, 2011 at 1:56 PM, Jack Levin <[EMAIL PROTECTED]> wrote:
>> >> my yourkit version expired :)... but here is the jstack when it
>> >> happens: http://pastebin.com/5v6mHg3t
>> >>
>> >> On Mon, May 2, 2011 at 1:00 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
>> >>> On Mon, May 2, 2011 at 12:56 PM, Jack Levin <[EMAIL PROTECTED]> wrote:
>> >>>
>> >>>> Tried removing yourkit and run on javasun, same thing.  We have some
>> >>>> threads blocked, does anyone know what they block on?
>> >>>>
>> >>>
>> >>> Which threads are blocked? Can you get some jstacks without yourkit?
>> >>>
>> >>> -Todd
>> >>>
>> >>>
>> >>>>
>> >>>> -Jack
>> >>>>
>> >>>> On Mon, May 2, 2011 at 7:53 AM, Todd Lipcon <[EMAIL PROTECTED]>
>> wrote:
>> >>>> > Hi Jack,
>> >>>> >
>> >>>> > Does this happen even if you aren't running Yourkit on the DN?
>> >>>> >
>> >>>> > Can you try using a Sun JDK instead of OpenJDK?
>> >>>> >
>> >>>> > -Todd
>> >>>> >
>> >>>> > On Sun, May 1, 2011 at 7:34 PM, Jack Levin <[EMAIL PROTECTED]>
>> wrote:
>> >>>> >
>> >>>> >> Version:         0.20.2+320 hdfs
>> >>>> >> .89 HBASE
>> >>>> >>
>> >>>> >> ulimit is 32k
>> >>>> >> xcievers is 5k
>> >>>> >>
>> >>>> >> Note from the jstack, I am not exceeding xcievers.
>> >>>> >>
>> >>>> >> -Jack
>> >>>> >>
>> >>>> >> On Sun, May 1, 2011 at 6:19 PM, Michael Segel <
>> >>>> [EMAIL PROTECTED]>
>> >>>> >> wrote:
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > What's your xceivers set to?
>> >>>> >> > What's the ulimit -n  set for hdfs/hadoop user... (You didn't say
>> >>>> which
>> >>>> >> release/version you were using.)
>> >>>> >> >
>> >>>> >> >> Date: Sun, 1 May 2011 17:47:18 -0700
>> >>>> >> >> Subject: one of our datanodes stops working after few hours
>> >>>> >> >> From: [EMAIL PROTECTED]
>> >>>> >> >> To: [EMAIL PROTECTED]
>> >>>> >> >>
>> >>>> >> >> I took a jstack (http://pastebin.com/5v6mHg3t).   After few
>> hours,
>> >>>> its
>> >>>> >> >> literally staggers to a halt and gets very very slow... Any
>> ideas
>> >>>> >> >> whats its blocking on?
>> >>>> >> >> (main issue is that fsreads for RS get really slow when that
>> >>>> happens).
>> >>>> >> >>
>> >>>> >> >> -Jack
>> >>>> >> >
>> >>>> >>
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> > --
>> >>>> > Todd Lipcon
>> >>>> > Software Engineer, Cloudera
>> >>>> >
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Todd Lipcon
>> >>> Software Engineer, Cloudera
>> >>>
>> >>
>> >
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>