Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Need for UDP / Multicast Source

Copy link to this message
Re: Need for UDP / Multicast Source
Andrew Otto 2013-01-17, 17:33
> Since each sink really has just one thread driving them, adding multiple sinks might help.

Oh hey, how does hdfs.threadsPoolSize relate to adding multiple sinks?  The docs say this is the

  Number of threads per HDFS sink for HDFS IO ops (open, write, etc.)

I've got 24 cores (12 + hyperthreading) on the machine I'm using to test this stuff.  I only see one under heavy use.  There are currently 98 flume threads running, and they are (relatively) spread out across all of the CPUs.  I'm starting to suspect that the source thread just can't keep up with all of the incoming UDP data, so it is dropping packets somewhere.  When this happens with another C program that we use to consume this stream internally, I see the 'drops' counter increase for the port in /proc/<pid>/net/udp, but I am not seeing this happen now.

Is there a way to know if the JVM (or in this case Netty?) is dropping UDP packets?  As far as I can tell, Java's UDP interface is just a wrapper around the native UDP socket implementation, so there shouldn't be anything hidden here.  Or maybe there is some sneaky JVM/Netty buffering going on that I don't know about?