I recommend Kafka or Flume-NG for this.
Our Analytics team is using a Kafka Producer on each server to tail logs
and ship them to Kafka. We use Oozie to schedule a MapReduce consumer every
few minutes to read all the Kafka topics into HDFS.
We use Kafka as a buffer, we keep a few weeks of data there. Our security
team for example sometimes connects up and consumes some logs for various
purposes. Usually when they want aggregate log data in realtime.
Most folks access them in HDFS. We have <1 minute of delay for most log
lines getting from the server where they were written to HDFS.
On Fri, Jun 7, 2013 at 5:30 PM, Mark <[EMAIL PROTECTED]> wrote:
> Like I said, Im a bit confused. I see the terms "events", "messages" and
> "logs" and not quite sure what to make of it.
> We are trying to determine the best way to aggregate all of our logs for
> processing in Hadoop. Kafka seems to fit this bill nicely however I want to
> know If its suited for other types of messages as well. Are there certain
> determine factors on why one would choose Kafka over RabbitMQ? Is it mostly
> scale or is it the type of messages/events/logs being produced/consumed?
> On Jun 7, 2013, at 5:21 PM, Alexis Richardson <[EMAIL PROTECTED]>
> > On Sat, Jun 8, 2013 at 1:08 AM, Mark <[EMAIL PROTECTED]> wrote:
> >> Im a bit confused on the concept of a "message" in Kafka. How does
> this differ, if at all, from a message in RabbitMQ? It seems to me that
> Kafka is better suited for very write intensive "messages" like log data
> but RabbitMQ may be a better fit for traditional "messages"… i.e. "Product
> Purchased" or "User Registered" message.
> > I'm not sure why you think this, or how to distinguish between a 'log'
> > message and some other kind.
> > Messages = data, annotated with metadata. The latter is typically a
> > protocol-specific envelope. Kafka and Rabbit certainly have different
> > envelopes, eg for mapping data to subscribers/queries.
> > alexis
*Jonathan Creasy* | Sr. Ops Engineer
e: [EMAIL PROTECTED] | t: 314.580.8909