Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> High load on datanode startup


Copy link to this message
-
Re: High load on datanode startup
Looks like you have some under replicated blocks. Does that number
decreases if you fsck multiple times?
Regards,
Serge

On 5/9/12 12:23 PM, "Darrell Taylor" <[EMAIL PROTECTED]> wrote:

>On Wed, May 9, 2012 at 6:04 PM, Serge Blazhiyevskyy <
>[EMAIL PROTECTED]> wrote:
>
>>
>> Whats the response from fsck look like?
>>
>>
>[snip lots of stuff about under replicated blocks]
>
>......Status: HEALTHY
> Total size:    246858876262 B (Total open files size: 372 B)
> Total dirs:    14914
> Total files:   39248 (Files currently being written: 4)
> Total blocks (validated):      40657 (avg. block size 6071743 B) (Total
>open file blocks (not validated): 4)
> Minimally replicated blocks:   40657 (100.0 %)
> Over-replicated blocks:        0 (0.0 %)
> Under-replicated blocks:       1410 (3.4680374 %)
> Mis-replicated blocks:         0 (0.0 %)
> Default replication factor:    3
> Average block replication:     2.9911454
> Corrupt blocks:                0
> Missing replicas:              2831 (2.3279145 %)
> Number of data-nodes:          5
> Number of racks:               1
>FSCK ended at Wed May 09 19:19:11 UTC 2012 in 980 milliseconds
>
>
>Further information to add to this, it appear to be affecting 2 nodes in
>the cluster, one more than the other though.  In the last couple of hours
>one of the nodes has also experienced high load, this has now dropped but
>both of these nodes are now considered dead by the namenode.  The first
>box
>load is still increasing, currently 234! I think I might have to reboot it
>via IPMI.
>
>
>>
>> hadoop fsck /
>>
>>
>> It might be the case that some of the blocks are misreplicated
>>
>>
>> Serge
>>
>> Hadoopway.blogspot.com
>>
>>
>>
>>
>>
>> On 5/9/12 9:58 AM, "Darrell Taylor" <[EMAIL PROTECTED]> wrote:
>>
>> >On Wed, May 9, 2012 at 5:56 PM, Serge Blazhiyevskyy <
>> >[EMAIL PROTECTED]> wrote:
>> >
>> >> Take a look at your data distribution for that cluster. Maybe, it is
>> >> unbalanced.
>> >>
>> >>
>> >> Run balancer, if it isŠ
>> >>
>> >
>> >The cluster is balanced, I ran balancer yesterday.  Oddly enough the
>> >problem started after I had run the balancer.
>> >
>> >I'm running CDH3 btw.
>> >
>> >
>> >
>> >>
>> >> Regards,
>> >> Serge
>> >>
>> >> hadoopway.blogspot.com
>> >>
>> >>
>> >>
>> >> On 5/9/12 9:52 AM, "Darrell Taylor" <[EMAIL PROTECTED]> wrote:
>> >>
>> >> >Hi,
>> >> >
>> >> >I wonder if someone could give some pointers with a problem I'm
>>having?
>> >> >
>> >> >I have a 7 machine cluster setup for testing and we have been
>>pouring
>> >>data
>> >> >into it for a week without issue, have learnt several thing along
>>the
>> >>way
>> >> >and solved all the problems up to now by searching online, but now
>>I'm
>> >> >stuck.  One of the data nodes decided to have a load of 70+ this
>> >>morning,
>> >> >stopping datanode and tasktracker brought it back to normal, but
>>every
>> >> >time
>> >> >I start the datanode again the load shoots through the roof, and
>>all I
>> >>get
>> >> >in the logs is :
>> >> >
>> >> >STARTUP_MSG: Starting DataNode
>> >> >
>> >> >
>> >> >STARTUP_MSG:   host = pl464/10.20.16.64
>> >> >
>> >> >
>> >> >STARTUP_MSG:   args = []
>> >> >
>> >> >
>> >> >STARTUP_MSG:   version = 0.20.2-cdh3u3
>> >> >
>> >> >
>> >> >STARTUP_MSG:   build >> >>
>>
>>>>>file:///data/1/tmp/nightly_2012-03-20_13-13-48_3/hadoop-0.20-0.20.2+92
>>>>>3.
>> >>>19
>> >> >7-1~squeeze
>> >> >-************************************************************/
>> >> >
>> >> >
>> >> >2012-05-09 16:12:05,925 INFO
>> >> >org.apache.hadoop.security.UserGroupInformation: JAAS Configuration
>> >> >already
>> >> >set up for Hadoop, not re-installing.
>> >> >
>> >> >2012-05-09 16:12:06,139 INFO
>> >> >org.apache.hadoop.security.UserGroupInformation: JAAS Configuration
>> >> >already
>> >> >set up for Hadoop, not re-installing.
>> >> >
>> >> >Nothing else.
>> >> >
>> >> >The load seems to max out only 1 of the CPUs, but the machine
>>becomes
>> >> >*very* unresponsive
>> >> >
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB