Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> Kafka Messages


+
Mark 2013-06-08, 00:09
+
Alexis Richardson 2013-06-08, 00:22
+
Mark 2013-06-08, 00:30
+
Jonathan Creasy 2013-06-08, 00:33
+
Mark 2013-06-08, 01:06
+
Jonathan Creasy 2013-06-08, 01:53
+
Mark 2013-06-08, 03:13
Copy link to this message
-
Re: Kafka Messages
*We have a mixture of:*

ERROR: something bad happened

*Some logs are "actions":*

user upload a file
user collaborated a file

*Some logs are metrics:*

counter:webapp.rps:+1

-Jonathan
On Fri, Jun 7, 2013 at 8:12 PM, Mark <[EMAIL PROTECTED]> wrote:

> Are these always log files in the sense of log files or do they also
> contain some event data.. i.e. Product A was purchased or User A just
> signed in, etc?
>
> On Jun 7, 2013, at 6:53 PM, Jonathan Creasy <[EMAIL PROTECTED]> wrote:
>
> Correct, we essentially use the logs as an additional buffer in case of
> outage in the pipeline. Typically though, messages are produces as soon as
> they are written.
>
>
> -Jonathan
>
>
> On Fri, Jun 7, 2013 at 6:06 PM, Mark <[EMAIL PROTECTED]> wrote:
>
>> Ok so in your use case instead of your application(s) writing directly to
>> Kafka you instead have a separate process running that will tail log files
>> and ship them over to Kafka. Is that correct?
>>
>> On Jun 7, 2013, at 5:33 PM, Jonathan Creasy <[EMAIL PROTECTED]> wrote:
>>
>> > I recommend Kafka or Flume-NG for this.
>> >
>> > Our Analytics team is using a Kafka Producer on each server to tail logs
>> > and ship them to Kafka. We use Oozie to schedule a MapReduce consumer
>> every
>> > few minutes to read all the Kafka topics into HDFS.
>> >
>> > We use Kafka as a buffer, we keep a few weeks of data there. Our
>> security
>> > team for example sometimes connects up and consumes some logs for
>> various
>> > purposes. Usually when they want aggregate log data in realtime.
>> >
>> > Most folks access them in HDFS. We have <1 minute of delay for most log
>> > lines getting from the server where they were written to HDFS.
>> >
>> > -Jonathan
>> >
>> >
>> > On Fri, Jun 7, 2013 at 5:30 PM, Mark <[EMAIL PROTECTED]> wrote:
>> >
>> >> Like I said, Im a bit confused. I see the terms "events", "messages"
>> and
>> >> "logs" and not quite sure what to make of it.
>> >>
>> >> We are trying to determine the best way to aggregate all of our logs
>> for
>> >> processing in Hadoop. Kafka seems to fit this bill nicely however I
>> want to
>> >> know If its suited for other types of messages as well. Are there
>> certain
>> >> determine factors on why one would choose Kafka over RabbitMQ? Is it
>> mostly
>> >> scale or is it the type of messages/events/logs being
>> produced/consumed?
>> >>
>> >> On Jun 7, 2013, at 5:21 PM, Alexis Richardson <
>> [EMAIL PROTECTED]>
>> >> wrote:
>> >>
>> >>> On Sat, Jun 8, 2013 at 1:08 AM, Mark <[EMAIL PROTECTED]>
>> wrote:
>> >>>> Im a bit confused on the concept of a "message" in Kafka.  How does
>> >> this differ, if at all, from a message in RabbitMQ? It seems to me that
>> >> Kafka is better suited for very write intensive "messages" like log
>> data
>> >> but RabbitMQ may be a better fit for traditional "messages"… i.e.
>> "Product
>> >> Purchased" or "User Registered" message.
>> >>>
>> >>> I'm not sure why you think this, or how to distinguish between a 'log'
>> >>> message and some other kind.
>> >>>
>> >>> Messages = data, annotated with metadata.  The latter is typically a
>> >>> protocol-specific envelope.  Kafka and Rabbit certainly have different
>> >>> envelopes, eg for mapping data to subscribers/queries.
>> >>>
>> >>> alexis
>> >>
>> >>
>> >
>> >
>> > --
>> > **
>> >
>> > *Jonathan Creasy* | Sr. Ops Engineer
>> >
>> > e: [EMAIL PROTECTED] | t: 314.580.8909
>>
>>
>
>