Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> Kafka Messages


+
Mark 2013-06-08, 00:09
+
Alexis Richardson 2013-06-08, 00:22
+
Mark 2013-06-08, 00:30
+
Jonathan Creasy 2013-06-08, 00:33
+
Mark 2013-06-08, 01:06
+
Jonathan Creasy 2013-06-08, 01:53
+
Mark 2013-06-08, 03:13
Copy link to this message
-
Re: Kafka Messages
*We have a mixture of:*

ERROR: something bad happened

*Some logs are "actions":*

user upload a file
user collaborated a file

*Some logs are metrics:*

counter:webapp.rps:+1

-Jonathan
On Fri, Jun 7, 2013 at 8:12 PM, Mark <[EMAIL PROTECTED]> wrote:

> Are these always log files in the sense of log files or do they also
> contain some event data.. i.e. Product A was purchased or User A just
> signed in, etc?
>
> On Jun 7, 2013, at 6:53 PM, Jonathan Creasy <[EMAIL PROTECTED]> wrote:
>
> Correct, we essentially use the logs as an additional buffer in case of
> outage in the pipeline. Typically though, messages are produces as soon as
> they are written.
>
>
> -Jonathan
>
>
> On Fri, Jun 7, 2013 at 6:06 PM, Mark <[EMAIL PROTECTED]> wrote:
>
>> Ok so in your use case instead of your application(s) writing directly to
>> Kafka you instead have a separate process running that will tail log files
>> and ship them over to Kafka. Is that correct?
>>
>> On Jun 7, 2013, at 5:33 PM, Jonathan Creasy <[EMAIL PROTECTED]> wrote:
>>
>> > I recommend Kafka or Flume-NG for this.
>> >
>> > Our Analytics team is using a Kafka Producer on each server to tail logs
>> > and ship them to Kafka. We use Oozie to schedule a MapReduce consumer
>> every
>> > few minutes to read all the Kafka topics into HDFS.
>> >
>> > We use Kafka as a buffer, we keep a few weeks of data there. Our
>> security
>> > team for example sometimes connects up and consumes some logs for
>> various
>> > purposes. Usually when they want aggregate log data in realtime.
>> >
>> > Most folks access them in HDFS. We have <1 minute of delay for most log
>> > lines getting from the server where they were written to HDFS.
>> >
>> > -Jonathan
>> >
>> >
>> > On Fri, Jun 7, 2013 at 5:30 PM, Mark <[EMAIL PROTECTED]> wrote:
>> >
>> >> Like I said, Im a bit confused. I see the terms "events", "messages"
>> and
>> >> "logs" and not quite sure what to make of it.
>> >>
>> >> We are trying to determine the best way to aggregate all of our logs
>> for
>> >> processing in Hadoop. Kafka seems to fit this bill nicely however I
>> want to
>> >> know If its suited for other types of messages as well. Are there
>> certain
>> >> determine factors on why one would choose Kafka over RabbitMQ? Is it
>> mostly
>> >> scale or is it the type of messages/events/logs being
>> produced/consumed?
>> >>
>> >> On Jun 7, 2013, at 5:21 PM, Alexis Richardson <
>> [EMAIL PROTECTED]>
>> >> wrote:
>> >>
>> >>> On Sat, Jun 8, 2013 at 1:08 AM, Mark <[EMAIL PROTECTED]>
>> wrote:
>> >>>> Im a bit confused on the concept of a "message" in Kafka.  How does
>> >> this differ, if at all, from a message in RabbitMQ? It seems to me that
>> >> Kafka is better suited for very write intensive "messages" like log
>> data
>> >> but RabbitMQ may be a better fit for traditional "messages"… i.e.
>> "Product
>> >> Purchased" or "User Registered" message.
>> >>>
>> >>> I'm not sure why you think this, or how to distinguish between a 'log'
>> >>> message and some other kind.
>> >>>
>> >>> Messages = data, annotated with metadata.  The latter is typically a
>> >>> protocol-specific envelope.  Kafka and Rabbit certainly have different
>> >>> envelopes, eg for mapping data to subscribers/queries.
>> >>>
>> >>> alexis
>> >>
>> >>
>> >
>> >
>> > --
>> > **
>> >
>> > *Jonathan Creasy* | Sr. Ops Engineer
>> >
>> > e: [EMAIL PROTECTED] | t: 314.580.8909
>>
>>
>
>

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB