|
|
-
Problem sending metrics to multiple targetsIvan Tretyakov 2013-01-17, 15:17
Hi!
We have following problem. There are three target hosts to send metrics: 192.168.1.111:8649, 192.168.1.113:8649,192.168.1.115:8649 (node01, node03, node05). But for example datanode (using org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31) sends one metrics to first target host and the another to the second and third. So some metrics missed on second and third node. When gmetad collects metrics from one of these we could not see certain metrics in ganglia. E.g. on node07 running only one process which sends metrics to ganglia - datanode process and we could see following using tcpdump. Dumping traffic for about three minutes: $ sudo -i tcpdump dst port 8649 and src host node07 | tee tcpdump.out ... $ head -n1 tcpdump.out 12:18:05.559719 IP node07.dom.local.43350 > node01.dom.local.8649: UDP, length 180 $ tail -n1 tcpdump.out 12:20:59.575144 IP node Then count packets and bytes sent to each target: $ grep node01 tcpdump.out | wc -l 5972 $ grep node03 tcpdump.out | wc -l 3812 $ grep node05 tcpdump.out | wc -l 3811 $ grep node01 tcpdump.out | awk 'BEGIN{sum=0}{sum=sum+$8}END{print sum}' 1048272 $ grep node03 tcpdump.out | awk 'BEGIN{sum=0}{sum=sum+$8}END{print sum}' 731604 $ grep node05 tcpdump.out | awk 'BEGIN{sum=0}{sum=sum+$8}END{print sum}' 731532 Also we could request gmond daemons which metrics do they have: $ nc node01 8649 | grep ProcessName_DataNode | head -n1 <METRIC NAME="jvm.JvmMetrics.ProcessName_DataNode.LogFatal" VAL="0" TYPE="float" UNITS="" TN="0" TMAX="60" DMAX="0" SLOPE="positive"> $ nc node03 8649 | grep ProcessName_DataNode | head -n1 $ nc node05 8649 | grep ProcessName_DataNode | head -n1 $ nc node01 8649 | grep ProcessName_DataNode | wc -l 100 $ nc node03 8649 | grep ProcessName_DataNode | wc -l 0 $ nc node05 8649 | grep ProcessName_DataNode | wc -l 0 We could see that only first collector node from the list has certain metrics. Hadoop version we use: - MapReduce 2.0.0-mr1-cdh4.1.1 - HDFS 2.0.0-cdh4.1.1 hadoop-metrics2.properties content: datanode.period=20 datanode.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31 datanode.sink.ganglia.servers=192.168.1.111:8649,192.168.1.113:8649, 192.168.1.115:8649 datanode.sink.ganglia.tagsForPrefix.jvm=* datanode.sink.ganglia.tagsForPrefix.dfs=* datanode.sink.ganglia.tagsForPrefix.rpc=* datanode.sink.ganglia.tagsForPrefix.rpcdetailed=* datanode.sink.ganglia.tagsForPrefix.metricssystem=* -- Best Regards Ivan Tretyakov |