Well, you could start by looking at the Kafka Producer source code for some
ideas. We have built plenty of solid software on that.
As to your goal of building something solid, robust, and critical. All I
can say is you then need to keep your Producer as simple as possible -- the
simpler it is, the less like it is to crash, have bugs, and you must test
it very well. Get the data to Kafka as fast as possible, so the chance of
losing any due to a crash are very small. Take a long time to test it. The
Producers I have written (in C++) run for weeks without going down (and
then we usually bring them down on purpose for upgrades). However, they
were in test for months too.
On Thu, Jan 30, 2014 at 6:31 AM, Thibaud Chardonnens
> Thanks for your quick answer.
> Yes, sorry it's probably too broad but my main question was if there is
> any best practices to build a robust, fault-tolerant producer that
> guarantees that no data will be dropped while listening on the port.
> From my point of view the producer will be the most critical part in the
> system, if something goes wrong with it, the workflow will be stopped and
> data will be lost.
> Do you have by any chance a pointer to an existing implementation of a
> such producer?
> Le 30 janv. 2014 à 15:13, Philip O'Toole <[EMAIL PROTECTED]> a écrit :
> > What exactly are you struggling with? Your question is too broad. What
> you want to do is eminently possible, having done it myself from scratch.
> > Philip
> >> On Jan 30, 2014, at 6:00 AM, Thibaud Chardonnens <[EMAIL PROTECTED]>
> >> Hello -- I am struggling about how to design a robust implementation of
> a producer.
> >> My use case is quite simple:
> >> I want to process a relatively big stream (~8MB/s) with Storm. Kafka
> will be used as intermediate between the stream and Storm. The stream is
> sent to a specific server on a specific port (through UDP). So Storm will
> be the consumer and I need to write a producer (basically in Java) that
> will listen on that specific port and send messages to a Kafka topic.
> >> Kafka and Storm are well designed and fault-tolerant, if a node goes
> down the whole environment continues to work properly etc... Therefore my
> producer will be a single point of failure in the workflow. Moreover,
> writing a such producer is not so easy, I'll need to write a multithreaded
> server to keep up with the throughput of the stream without guarantee that
> no data will be dropped...
> >> So I would like to know if there is some best practices to write a such
> producer or is there an other (maybe simpler) way to do?
> >> Thanks,
> >> Thibaud