|
|
Xiaohan 2012-10-24, 14:44
In our production environment. We encount a problem about the performance of NameNode. We configure the sharestorge of NameNode with bookkeeper. And our version of hadoop is 2.0.1, bk is 4.1.0.
The problem is: When the hdfs system has run for a while(2-3 days), we found the performance descreased dramatically! The benchmark with nnbench from hadoop-mapreduce-client-jobclient-2.0.1-tests.jar is like:
First use: ./yarn jar ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.1-tests.jar nnbench -operation create_write -numberOfFiles 10 We get: 12/10/20 20:05:43 INFO hdfs.NNBench: TPS: Create/Write/Close: 52
Two days later, we get: 12/10/23 18:34:42 INFO hdfs.NNBench: TPS: Create/Write/Close: 1 //The "Avg exec time (ms): Create/Write/Close:" is even larger, maybe than 1000ms, so the TPS here may be smaller for precision.
And the logs in NameNode, we found the difference from each of the times:
2012-10-20 20:05:43,249 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: **** Number of syncs: 1347 SyncTimes(ms): 14138 3677
2012-10-22 18:34:42,223 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: **** Number of syncs: 51 SyncTimes(ms): 34553 312
We inspect that it is the problem of Bookkeeper. Anyone ever encounter that or any clue for that? Thanks very much. The environment is strictly controlled, and the logs can only be copied by hand. So the logs are not so detailed.
+
Xiaohan 2012-10-24, 14:44
Ivan Kelly 2012-10-24, 15:10
On Wed, Oct 24, 2012 at 02:44:45PM +0000, Xiaohan wrote: > And the logs in NameNode, we found the difference from each of the times: > > 2012-10-20 20:05:43,249 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: **** Number of syncs: 1347 SyncTimes(ms): 14138 3677 > > 2012-10-22 18:34:42,223 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: **** Number of syncs: 51 SyncTimes(ms): 34553 312 > > We inspect that it is the problem of Bookkeeper. Anyone ever encounter that or any clue for that? Thanks very much. > The environment is strictly controlled, and the logs can only be copied by hand. So the logs are not so detailed. How many bookies are you using? Are any of the bookies displaying disk errors? what does iostat say on the bookies and on the namenode?
It does look like the editlog is the culprit here. However it's not clear that it's BK. If BK is the shared edits, it should be second in the list of journals. From the sync times, the second journal seems to be performing fine.
-Ivan
+
Ivan Kelly 2012-10-24, 15:10
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext