Mark 2011-11-03, 21:04
We had one problem that would pop up out of nowhere...
Another serious issue was when agents started to produce massive amounts
of data. For example, the logs produced by 1 machine was maybe
1mb/minute but when the agent was unable to communicate with any
collectors for what ever reason it would fill up with GB's of data
sitting in one of flumes subfolders (sent, sending, completed, etc).
Any links on how to create some real time analysis using kafka?
On 11/3/11 12:18 PM, Neha Narkhede wrote:
>>> First and foremost we are currently using RSylog to aggregate our logs from our application servers.
> This is similar to the legacy system we had at LinkedIn, now
> successfully replaced by Kafka.
>>> Although this strategy has been working for our bulk processing needs it doen'st help us much with realtime analysis, something we would really like to introduce.
> Kafka is designed to efficiently feed both real time and offline data
> pipelines. Being a pub-sub messaging system, it fits the need for
> real-time applications well. Its high throughput nature and built-in
> consumer parallelism features make it a good fit for feeding large
> systems like Hadoop and data-warehouses. At LinkedIn, we use it for
> activity tracking as well as real time RPC log analysis.
> For more information, please visit our webpage -
> http://incubator.apache.org/kafka/index.html. It has a detailed design
> writeup, and quickstart for you to try it out.
>>> We've tried Flume but that didn't work out too well.
> I'm interested in knowing what roadblocks you hit while trying Flume
> out, for curiosity sake ?
> On Thu, Nov 3, 2011 at 11:58 AM, Mark<[EMAIL PROTECTED]> wrote:
>> Neha thanks for the response.
>> I'll try and explain our use case. First and foremost we are currently using
>> RSylog to aggregate our logs from our application servers. This is
>> accomplished using their TCP plugin which sends logs to a cluster of logging
>> machines. At the end of the day we then import this into Hadoop. Although
>> this strategy has been working for our bulk processing needs it doen'st help
>> us much with realtime analysis, something we would really like to introduce.
>> We've tried Flume but that didn't work out too well. So now we are in the
>> process of looking into alternative technologies that can help us with both
>> our bulk and realtime analysis needs.
>> Does it sound like Kafka would be a nice fit for our use case? Are there any
>> examples, documentation on realtime analysis with Kafka?
>> On 11/3/11 11:37 AM, Neha Narkhede wrote:
>>> For activity on the mailing list, take a look at these metrics -
>>> For activity of the committers and the development -
>>> A full-fledged comparison can be quite lengthy. Would you mind
>>> describing your case ? We can discuss the available alternatives and
>>> how Kafka would fit in.
>>> Kafka has been deployed in production at LinkedIn for over a year and
>>> a half. I believe there are other smaller startups using it too, and
>>> more in the pipeline.
>>> On Thu, Nov 3, 2011 at 11:00 AM, Mark<[EMAIL PROTECTED]> wrote:
>>>> I was wondering what the current state of Kafka is. Is it gaining much
>>>> traction? How active is the project, commiters and mailing lists? Are
>>>> other more popular alternatives out there? Any comparasion would help.
>>>> Thanks for any input.