Correct, we essentially use the logs as an additional buffer in case of
outage in the pipeline. Typically though, messages are produces as soon as
they are written.
On Fri, Jun 7, 2013 at 6:06 PM, Mark <[EMAIL PROTECTED]> wrote:
> Ok so in your use case instead of your application(s) writing directly to
> Kafka you instead have a separate process running that will tail log files
> and ship them over to Kafka. Is that correct?
> On Jun 7, 2013, at 5:33 PM, Jonathan Creasy <[EMAIL PROTECTED]> wrote:
> > I recommend Kafka or Flume-NG for this.
> > Our Analytics team is using a Kafka Producer on each server to tail logs
> > and ship them to Kafka. We use Oozie to schedule a MapReduce consumer
> > few minutes to read all the Kafka topics into HDFS.
> > We use Kafka as a buffer, we keep a few weeks of data there. Our security
> > team for example sometimes connects up and consumes some logs for various
> > purposes. Usually when they want aggregate log data in realtime.
> > Most folks access them in HDFS. We have <1 minute of delay for most log
> > lines getting from the server where they were written to HDFS.
> > -Jonathan
> > On Fri, Jun 7, 2013 at 5:30 PM, Mark <[EMAIL PROTECTED]> wrote:
> >> Like I said, Im a bit confused. I see the terms "events", "messages" and
> >> "logs" and not quite sure what to make of it.
> >> We are trying to determine the best way to aggregate all of our logs for
> >> processing in Hadoop. Kafka seems to fit this bill nicely however I
> want to
> >> know If its suited for other types of messages as well. Are there
> >> determine factors on why one would choose Kafka over RabbitMQ? Is it
> >> scale or is it the type of messages/events/logs being produced/consumed?
> >> On Jun 7, 2013, at 5:21 PM, Alexis Richardson <
> [EMAIL PROTECTED]>
> >> wrote:
> >>> On Sat, Jun 8, 2013 at 1:08 AM, Mark <[EMAIL PROTECTED]>
> >>>> Im a bit confused on the concept of a "message" in Kafka. How does
> >> this differ, if at all, from a message in RabbitMQ? It seems to me that
> >> Kafka is better suited for very write intensive "messages" like log data
> >> but RabbitMQ may be a better fit for traditional "messages"… i.e.
> >> Purchased" or "User Registered" message.
> >>> I'm not sure why you think this, or how to distinguish between a 'log'
> >>> message and some other kind.
> >>> Messages = data, annotated with metadata. The latter is typically a
> >>> protocol-specific envelope. Kafka and Rabbit certainly have different
> >>> envelopes, eg for mapping data to subscribers/queries.
> >>> alexis
> > --
> > **
> > *Jonathan Creasy* | Sr. Ops Engineer
> > e: [EMAIL PROTECTED] | t: 314.580.8909