Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> High load on datanode startup


+
Darrell Taylor 2012-05-09, 16:52
+
Raj Vishwanathan 2012-05-09, 21:23
+
Darrell Taylor 2012-05-09, 21:40
+
Raj Vishwanathan 2012-05-09, 21:52
+
Darrell Taylor 2012-05-10, 06:57
+
Todd Lipcon 2012-05-10, 08:33
+
Darrell Taylor 2012-05-10, 10:57
+
Raj Vishwanathan 2012-05-10, 16:58
+
Darrell Taylor 2012-05-11, 09:29
+
Todd Lipcon 2012-05-11, 09:32
+
Harsh J 2012-05-11, 10:36
+
Serge Blazhiyevskyy 2012-05-09, 16:56
+
Darrell Taylor 2012-05-09, 16:58
+
Serge Blazhiyevskyy 2012-05-09, 17:04
+
Darrell Taylor 2012-05-09, 19:23
Copy link to this message
-
Re: High load on datanode startup
Looks like you have some under replicated blocks. Does that number
decreases if you fsck multiple times?
Regards,
Serge

On 5/9/12 12:23 PM, "Darrell Taylor" <[EMAIL PROTECTED]> wrote:

>On Wed, May 9, 2012 at 6:04 PM, Serge Blazhiyevskyy <
>[EMAIL PROTECTED]> wrote:
>
>>
>> Whats the response from fsck look like?
>>
>>
>[snip lots of stuff about under replicated blocks]
>
>......Status: HEALTHY
> Total size:    246858876262 B (Total open files size: 372 B)
> Total dirs:    14914
> Total files:   39248 (Files currently being written: 4)
> Total blocks (validated):      40657 (avg. block size 6071743 B) (Total
>open file blocks (not validated): 4)
> Minimally replicated blocks:   40657 (100.0 %)
> Over-replicated blocks:        0 (0.0 %)
> Under-replicated blocks:       1410 (3.4680374 %)
> Mis-replicated blocks:         0 (0.0 %)
> Default replication factor:    3
> Average block replication:     2.9911454
> Corrupt blocks:                0
> Missing replicas:              2831 (2.3279145 %)
> Number of data-nodes:          5
> Number of racks:               1
>FSCK ended at Wed May 09 19:19:11 UTC 2012 in 980 milliseconds
>
>
>Further information to add to this, it appear to be affecting 2 nodes in
>the cluster, one more than the other though.  In the last couple of hours
>one of the nodes has also experienced high load, this has now dropped but
>both of these nodes are now considered dead by the namenode.  The first
>box
>load is still increasing, currently 234! I think I might have to reboot it
>via IPMI.
>
>
>>
>> hadoop fsck /
>>
>>
>> It might be the case that some of the blocks are misreplicated
>>
>>
>> Serge
>>
>> Hadoopway.blogspot.com
>>
>>
>>
>>
>>
>> On 5/9/12 9:58 AM, "Darrell Taylor" <[EMAIL PROTECTED]> wrote:
>>
>> >On Wed, May 9, 2012 at 5:56 PM, Serge Blazhiyevskyy <
>> >[EMAIL PROTECTED]> wrote:
>> >
>> >> Take a look at your data distribution for that cluster. Maybe, it is
>> >> unbalanced.
>> >>
>> >>
>> >> Run balancer, if it isŠ
>> >>
>> >
>> >The cluster is balanced, I ran balancer yesterday.  Oddly enough the
>> >problem started after I had run the balancer.
>> >
>> >I'm running CDH3 btw.
>> >
>> >
>> >
>> >>
>> >> Regards,
>> >> Serge
>> >>
>> >> hadoopway.blogspot.com
>> >>
>> >>
>> >>
>> >> On 5/9/12 9:52 AM, "Darrell Taylor" <[EMAIL PROTECTED]> wrote:
>> >>
>> >> >Hi,
>> >> >
>> >> >I wonder if someone could give some pointers with a problem I'm
>>having?
>> >> >
>> >> >I have a 7 machine cluster setup for testing and we have been
>>pouring
>> >>data
>> >> >into it for a week without issue, have learnt several thing along
>>the
>> >>way
>> >> >and solved all the problems up to now by searching online, but now
>>I'm
>> >> >stuck.  One of the data nodes decided to have a load of 70+ this
>> >>morning,
>> >> >stopping datanode and tasktracker brought it back to normal, but
>>every
>> >> >time
>> >> >I start the datanode again the load shoots through the roof, and
>>all I
>> >>get
>> >> >in the logs is :
>> >> >
>> >> >STARTUP_MSG: Starting DataNode
>> >> >
>> >> >
>> >> >STARTUP_MSG:   host = pl464/10.20.16.64
>> >> >
>> >> >
>> >> >STARTUP_MSG:   args = []
>> >> >
>> >> >
>> >> >STARTUP_MSG:   version = 0.20.2-cdh3u3
>> >> >
>> >> >
>> >> >STARTUP_MSG:   build >> >>
>>
>>>>>file:///data/1/tmp/nightly_2012-03-20_13-13-48_3/hadoop-0.20-0.20.2+92
>>>>>3.
>> >>>19
>> >> >7-1~squeeze
>> >> >-************************************************************/
>> >> >
>> >> >
>> >> >2012-05-09 16:12:05,925 INFO
>> >> >org.apache.hadoop.security.UserGroupInformation: JAAS Configuration
>> >> >already
>> >> >set up for Hadoop, not re-installing.
>> >> >
>> >> >2012-05-09 16:12:06,139 INFO
>> >> >org.apache.hadoop.security.UserGroupInformation: JAAS Configuration
>> >> >already
>> >> >set up for Hadoop, not re-installing.
>> >> >
>> >> >Nothing else.
>> >> >
>> >> >The load seems to max out only 1 of the CPUs, but the machine
>>becomes
>> >> >*very* unresponsive
>> >> >
+
Darrell Taylor 2012-05-09, 21:27
+
Serge Blazhiyevskyy 2012-05-09, 21:44