Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Problem sending metrics to multiple targets


Copy link to this message
-
Problem sending metrics to multiple targets
Ivan Tretyakov 2013-01-17, 15:17
Hi!

We have following problem.

There are three target hosts to send metrics: 192.168.1.111:8649,
192.168.1.113:8649,192.168.1.115:8649 (node01, node03, node05).
But for example datanode (using
org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31) sends one metrics to
first target host and the another to the second and third.
So some metrics missed on second and third node. When gmetad collects
metrics from one of these we could not see certain metrics in ganglia.

E.g. on node07 running only one process which sends metrics to ganglia -
datanode process and we could see following using tcpdump.

Dumping traffic for about three minutes:
$ sudo -i tcpdump dst port 8649 and src host node07 | tee tcpdump.out
...
$ head -n1 tcpdump.out
12:18:05.559719 IP node07.dom.local.43350 > node01.dom.local.8649: UDP,
length 180
$ tail -n1 tcpdump.out
12:20:59.575144 IP node

Then count packets and bytes sent to each target:
$ grep node01 tcpdump.out | wc -l
5972
$ grep node03 tcpdump.out | wc -l
3812
$ grep node05 tcpdump.out | wc -l
3811
$ grep node01 tcpdump.out | awk 'BEGIN{sum=0}{sum=sum+$8}END{print sum}'
1048272
$ grep node03 tcpdump.out | awk 'BEGIN{sum=0}{sum=sum+$8}END{print sum}'
731604
$ grep node05 tcpdump.out | awk 'BEGIN{sum=0}{sum=sum+$8}END{print sum}'
731532

Also we could request gmond daemons which metrics do they have:

$ nc node01 8649 | grep ProcessName_DataNode | head -n1
<METRIC NAME="jvm.JvmMetrics.ProcessName_DataNode.LogFatal" VAL="0"
TYPE="float" UNITS="" TN="0" TMAX="60" DMAX="0" SLOPE="positive">
$ nc node03 8649 | grep ProcessName_DataNode | head -n1
$ nc node05 8649 | grep ProcessName_DataNode | head -n1
$ nc node01 8649 | grep ProcessName_DataNode | wc -l
100
$ nc node03 8649 | grep ProcessName_DataNode | wc -l
0
$ nc node05 8649 | grep ProcessName_DataNode | wc -l
0

We could see that only first collector node from the list has certain
metrics.

Hadoop version we use:
- MapReduce 2.0.0-mr1-cdh4.1.1
- HDFS 2.0.0-cdh4.1.1

hadoop-metrics2.properties content:

datanode.period=20
datanode.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
datanode.sink.ganglia.servers=192.168.1.111:8649,192.168.1.113:8649,
192.168.1.115:8649
datanode.sink.ganglia.tagsForPrefix.jvm=*
datanode.sink.ganglia.tagsForPrefix.dfs=*
datanode.sink.ganglia.tagsForPrefix.rpc=*
datanode.sink.ganglia.tagsForPrefix.rpcdetailed=*
datanode.sink.ganglia.tagsForPrefix.metricssystem=*

--
Best Regards
Ivan Tretyakov