Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Re: Problem sending metrics to multiple targets


Copy link to this message
-
Re: Problem sending metrics to multiple targets
Ivan Tretyakov 2013-11-22, 13:30
We investigated the problem and found root cause. Metrics2 framework uses
different from first version config parser (Metrics2 uses apache-commons,
Metrics uses
hadoop's).  org.apache.hadoop.metrics2.sink.ganglia.AbstractGangliaSink uses
commas as separators by default. So when we provide list of servers it
returns everything until first separator - it is only first server from the
list.
But we were able to find workaround. Class parsing servers
list (org.apache.hadoop.metrics2.util.Servers) handles only commas and
spaces. It means if we will provide space separated list of servers instead
of comma separated then new parser will be able to read whole servers list.
After that all servers will be registered as metrics receivers and metrics
will be sent to all of them.
On Thu, Jan 17, 2013 at 7:17 PM, Ivan Tretyakov <[EMAIL PROTECTED]
> wrote:

> Hi!
>
> We have following problem.
>
> There are three target hosts to send metrics: 192.168.1.111:8649,
> 192.168.1.113:8649,192.168.1.115:8649 (node01, node03, node05).
> But for example datanode (using
> org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31) sends one metrics to
> first target host and the another to the second and third.
> So some metrics missed on second and third node. When gmetad collects
> metrics from one of these we could not see certain metrics in ganglia.
>
> E.g. on node07 running only one process which sends metrics to ganglia -
> datanode process and we could see following using tcpdump.
>
> Dumping traffic for about three minutes:
> $ sudo -i tcpdump dst port 8649 and src host node07 | tee tcpdump.out
> ...
> $ head -n1 tcpdump.out
> 12:18:05.559719 IP node07.dom.local.43350 > node01.dom.local.8649: UDP,
> length 180
> $ tail -n1 tcpdump.out
> 12:20:59.575144 IP node
>
> Then count packets and bytes sent to each target:
> $ grep node01 tcpdump.out | wc -l
> 5972
> $ grep node03 tcpdump.out | wc -l
> 3812
> $ grep node05 tcpdump.out | wc -l
> 3811
> $ grep node01 tcpdump.out | awk 'BEGIN{sum=0}{sum=sum+$8}END{print sum}'
> 1048272
> $ grep node03 tcpdump.out | awk 'BEGIN{sum=0}{sum=sum+$8}END{print sum}'
> 731604
> $ grep node05 tcpdump.out | awk 'BEGIN{sum=0}{sum=sum+$8}END{print sum}'
> 731532
>
> Also we could request gmond daemons which metrics do they have:
>
> $ nc node01 8649 | grep ProcessName_DataNode | head -n1
> <METRIC NAME="jvm.JvmMetrics.ProcessName_DataNode.LogFatal" VAL="0"
> TYPE="float" UNITS="" TN="0" TMAX="60" DMAX="0" SLOPE="positive">
> $ nc node03 8649 | grep ProcessName_DataNode | head -n1
> $ nc node05 8649 | grep ProcessName_DataNode | head -n1
> $ nc node01 8649 | grep ProcessName_DataNode | wc -l
> 100
> $ nc node03 8649 | grep ProcessName_DataNode | wc -l
> 0
> $ nc node05 8649 | grep ProcessName_DataNode | wc -l
> 0
>
> We could see that only first collector node from the list has certain
> metrics.
>
> Hadoop version we use:
> - MapReduce 2.0.0-mr1-cdh4.1.1
> - HDFS 2.0.0-cdh4.1.1
>
> hadoop-metrics2.properties content:
>
> datanode.period=20
>
> datanode.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
> datanode.sink.ganglia.servers=192.168.1.111:8649,192.168.1.113:8649,
> 192.168.1.115:8649
>  datanode.sink.ganglia.tagsForPrefix.jvm=*
> datanode.sink.ganglia.tagsForPrefix.dfs=*
> datanode.sink.ganglia.tagsForPrefix.rpc=*
> datanode.sink.ganglia.tagsForPrefix.rpcdetailed=*
> datanode.sink.ganglia.tagsForPrefix.metricssystem=*
>
> --
> Best Regards
> Ivan Tretyakov
>

--
Best Regards
Ivan Tretyakov