-Re: How to design a robust producer?
Andrew Otto 2014-01-30, 16:00
Thibaud, I wouldn't say this is a 'robust' solution, but the Wikimedia Foundation uses a piece of software we wrote called udp2log. We are in the process of replacing it with more robust direct Kafka producers, but it has worked for us in the intermediary. udp2log is a c++ daemon that listens for (newline delimited) messages over UDP, and then multiplexes them out to pipes or files. You could use this to pipe your UDP traffic into the default console-producer that ships with Kafka. Not 'robust' for sure, but it would work I think.
Deb package: http://apt.wikimedia.org/wikimedia/pool/main/u/udplog/
Example config: https://gist.github.com/ottomata/8711809
Also, as a proof of concept, one of my coworkers wrote this:
Similar to udp2log, but meant for exactly what you are asking for: relaying UDP packets into Kafka.
On Jan 30, 2014, at 10:20 AM, Clark Breyman <[EMAIL PROTECTED]> wrote:
> Sounds like one of your issues will be upstream of Kafka. Robust and UDP
> aren't something I usually think of together unless you have additional
> bookkeeping to detect and request lost messages. 8MB/s shouldn't be much of
> a problem unless the messages are very small and looking for individual
> commits. You also have the challenge of having the server
> process/machine/network go away after the UDP message is received but
> before it can be pushed to Kafka.
> Beyond that, there are a lot of server frameworks that work fine. I use
> Dropwizard mostly since I like Java, though it doesn't support UDP
> resources. There are plenty of options there, but that's probably not a
> Kafka issue.
> On Thu, Jan 30, 2014 at 6:38 AM, Philip O'Toole <[EMAIL PROTECTED]> wrote:
>> Well, you could start by looking at the Kafka Producer source code for some
>> ideas. We have built plenty of solid software on that.
>> As to your goal of building something solid, robust, and critical. All I
>> can say is you then need to keep your Producer as simple as possible -- the
>> simpler it is, the less like it is to crash, have bugs, and you must test
>> it very well. Get the data to Kafka as fast as possible, so the chance of
>> losing any due to a crash are very small. Take a long time to test it. The
>> Producers I have written (in C++) run for weeks without going down (and
>> then we usually bring them down on purpose for upgrades). However, they
>> were in test for months too.
>> On Thu, Jan 30, 2014 at 6:31 AM, Thibaud Chardonnens
>> <[EMAIL PROTECTED]>wrote:
>>> Thanks for your quick answer.
>>> Yes, sorry it's probably too broad but my main question was if there is
>>> any best practices to build a robust, fault-tolerant producer that
>>> guarantees that no data will be dropped while listening on the port.
>>> From my point of view the producer will be the most critical part in the
>>> system, if something goes wrong with it, the workflow will be stopped and
>>> data will be lost.
>>> Do you have by any chance a pointer to an existing implementation of a
>>> such producer?
>>> Le 30 janv. 2014 à 15:13, Philip O'Toole <[EMAIL PROTECTED]> a écrit :
>>>> What exactly are you struggling with? Your question is too broad. What
>>> you want to do is eminently possible, having done it myself from scratch.
>>>>> On Jan 30, 2014, at 6:00 AM, Thibaud Chardonnens <
>> [EMAIL PROTECTED]>
>>>>> Hello -- I am struggling about how to design a robust implementation
>>> a producer.
>>>>> My use case is quite simple:
>>>>> I want to process a relatively big stream (~8MB/s) with Storm. Kafka
>>> will be used as intermediate between the stream and Storm. The stream is
>>> sent to a specific server on a specific port (through UDP). So Storm will
>>> be the consumer and I need to write a producer (basically in Java) that