Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.


Copy link to this message
-
Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.
I set the "dfs.image.transfer.bandwidthPerSec." to 50M, and the performance
is below:

2013-08-14 12:32:33,079 INFO my.EditLogPerformance: totalCount:1342440
speed:1111
2013-08-14 12:32:43,082 INFO my.EditLogPerformance: totalCount:1363338
speed:1044
2013-08-14 12:32:53,085 INFO my.EditLogPerformance: totalCount:1385526
speed:1109
*2013-08-14 12:33:03,087 INFO my.EditLogPerformance: totalCount:1396324
speed:539*
*2013-08-14 12:33:13,090 INFO my.EditLogPerformance: totalCount:1406232
speed:495
2013-08-14 12:33:23,093 INFO my.EditLogPerformance: totalCount:1415006
speed:438
2013-08-14 12:33:33,096 INFO my.EditLogPerformance: totalCount:1423952
speed:447*
*2013-08-14 12:33:43,099 INFO my.EditLogPerformance: totalCount:1437256
speed:665*
2013-08-14 12:33:53,102 INFO my.EditLogPerformance: totalCount:1458378
speed:1056
2013-08-14 12:34:03,106 INFO my.EditLogPerformance: totalCount:1479338
speed:1048
2013-08-14 12:34:13,108 INFO my.EditLogPerformance: totalCount:1500400
speed:1053
2013-08-14 12:34:23,111 INFO my.EditLogPerformance: totalCount:1521252
speed:1042
2013-08-14 12:34:33,114 INFO my.EditLogPerformance: totalCount:1542286
speed:1051
2013-08-14 12:34:43,117 INFO my.EditLogPerformance: totalCount:1562956
speed:1033
2013-08-14 12:34:53,120 INFO my.EditLogPerformance: totalCount:1583804
speed:1042
2013-08-14 12:35:03,123 INFO my.EditLogPerformance: totalCount:1606558
speed:1137
2013-08-14 12:35:13,126 INFO my.EditLogPerformance: totalCount:1627980
speed:1071
2013-08-14 12:35:23,129 INFO my.EditLogPerformance: totalCount:1650642
speed:1133
2013-08-14 12:35:33,132 INFO my.EditLogPerformance: totalCount:1672806
speed:1108
2013-08-14 12:35:43,134 INFO my.EditLogPerformance: totalCount:1693940
speed:1056
2013-08-14 12:35:53,137 INFO my.EditLogPerformance: totalCount:1715430
speed:1074
2013-08-14 12:36:03,140 INFO my.EditLogPerformance: totalCount:1737940
speed:1125
2013-08-14 12:36:13,143 INFO my.EditLogPerformance: totalCount:1760094
speed:1107
2013-08-14 12:36:23,146 INFO my.EditLogPerformance: totalCount:1781646
speed:1077
2013-08-14 12:36:33,149 INFO my.EditLogPerformance: totalCount:1802230
speed:1029
2013-08-14 12:36:43,152 INFO my.EditLogPerformance: totalCount:1824132
speed:1095
2013-08-14 12:36:53,155 INFO my.EditLogPerformance: totalCount:1846778
speed:1132
2013-08-14 12:37:03,158 INFO my.EditLogPerformance: totalCount:1868956
speed:1108
2013-08-14 12:37:13,161 INFO my.EditLogPerformance: totalCount:1888556
speed:980
2013-08-14 12:37:23,164 INFO my.EditLogPerformance: totalCount:1910512
speed:1097
2013-08-14 12:37:33,167 INFO my.EditLogPerformance: totalCount:1932240
speed:1086
2013-08-14 12:37:43,170 INFO my.EditLogPerformance: totalCount:1954226
speed:1099
2013-08-14 12:37:53,173 INFO my.EditLogPerformance: totalCount:1974706
speed:1024
2013-08-14 12:38:03,176 INFO my.EditLogPerformance: totalCount:1993906
speed:960
2013-08-14 12:38:13,179 INFO my.EditLogPerformance: totalCount:2014172
speed:1013
2013-08-14 12:38:23,182 INFO my.EditLogPerformance: totalCount:2036130
speed:1097
2013-08-14 12:38:33,184 INFO my.EditLogPerformance: totalCount:2057848
speed:1085
2013-08-14 12:38:43,187 INFO my.EditLogPerformance: totalCount:2078834
speed:1049
2013-08-14 12:38:53,190 INFO my.EditLogPerformance: totalCount:2095616
speed:839
*2013-08-14 12:39:03,193 INFO my.EditLogPerformance: totalCount:2104896
speed:464
2013-08-14 12:39:13,196 INFO my.EditLogPerformance: totalCount:2114572
speed:483
2013-08-14 12:39:23,199 INFO my.EditLogPerformance: totalCount:2123512
speed:447*
*2013-08-14 12:39:33,202 INFO my.EditLogPerformance: totalCount:2133604
speed:504*
2013-08-14 12:39:43,205 INFO my.EditLogPerformance: totalCount:2149792
speed:809

The there are below info in Active NameNode:
2013-08-14 12:44:47,301 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to
http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=655178418&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
2013-08-14 12:48:57,529 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: *Transfer took
250.23s at 10280.59 KB/s*
2013-08-14 12:48:57,530 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
fsimage.ckpt_0000000000655178418 size 2634222089 bytes
The there are below info in StadnbyNameNode:
2013-08-14 12:43:57,924 INFO
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
checkpoint because there have been 2421083 txns since the last checkpoint,
which exceeds the configured threshold 1000000
2013-08-14 12:43:57,925 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file
/home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000655178418
using no compression
2013-08-14 12:48:58,044 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
250.75s at 0.00 KB/s
2013-08-14 12:48:58,045 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 655178418 to namenode at 10.232.98.77:20021

*When Active NameNode download the fsimage, the OPS is less than 500, the
OPS dropped by 50%.*

The "checkpoint.png" attachment is monitoring info in ganglia.
I have two questions:

   - *first question:*  The Active NameNode's bandwidth is occupied by
   50%,  Active NameNode  still has 50M bandwidth to be used to handle RPC
   request, why the OPS dropped by 50%?
   - *second question:* The fsimage file is 2634222089 bytes, Active
   NameNode spend 240s to download the file. Yhe
   "dfs.image.transfer.bandwidthPerSec" value is 50M,  I think the download
   time should is about 50s.
Thansk,

LiuLei