|
Karl Hennig
2012-04-20, 22:14
alo alt
2012-04-21, 09:07
M. C. Srivas
2012-04-21, 14:06
alo alt
2012-04-21, 15:42
Edward Capriolo
2012-04-21, 22:23
Chen He
2012-04-21, 23:50
Alexander Lorenz
2012-04-22, 04:59
Edward Capriolo
2012-04-22, 14:14
Michael Segel
2012-04-22, 15:34
Bill Graham
2012-04-22, 16:23
|
-
Feedback on real world production experience with FlumeKarl Hennig 2012-04-20, 22:14
I am investigating automated methods of moving our data from the web tier into HDFS for processing, a process that's performed periodically.
I am looking for feedback from anyone who has actually used Flume in a production setup (redundant, failover) successfully. I understand it is now being largely rearchitected during its incubation as Apache Flume-NG, so I don't have full confidence in the old, stable releases. The other option would be to write our own tools. What methods are you using for these kinds of tasks? Did you write your own or does Flume (or something else) work for you? I'm also on the Flume mailing list, but I wanted to ask these questions here because I'm interested in Flume _and_ alternatives. Thank you!
-
Re: Feedback on real world production experience with Flumealo alt 2012-04-21, 09:07
Hi,
in my former job: productive, Germany, Web portal. Throughput 600 mb/minute. Logfiles from Windows IIS, Apache. Used in a usual way, no own decorators or sinks. Simply syslog -> bucketing (1 minute rollover) -> hdfs splitted into minutes (YYYYMMDDHHMM). Stable, some issues (you'll found on the mailing list), but works well if you know what is to do when anything will happen. Btw, NG 1.1.0 is more stable as flume pre 1.x and runs in some productive environments. - Alex -- Alexander Lorenz http://mapredit.blogspot.com On Apr 21, 2012, at 12:14 AM, Karl Hennig wrote: > I am investigating automated methods of moving our data from the web tier into HDFS for processing, a process that's performed periodically. > > I am looking for feedback from anyone who has actually used Flume in a production setup (redundant, failover) successfully. I understand it is now being largely rearchitected during its incubation as Apache Flume-NG, so I don't have full confidence in the old, stable releases. > > The other option would be to write our own tools. What methods are you using for these kinds of tasks? Did you write your own or does Flume (or something else) work for you? > > I'm also on the Flume mailing list, but I wanted to ask these questions here because I'm interested in Flume _and_ alternatives. > > Thank you! >
-
Re: Feedback on real world production experience with FlumeM. C. Srivas 2012-04-21, 14:06
Karl,
since you did ask for alternatives, people using MapR prefer to use the NFS access to directly deposit data (or access it). Works seamlessly from all Linuxes, Solaris, Windows, AIX and a myriad of other legacy systems without having to load any agents on those machines. And it is fully automatic HA Since compression is built-in in MapR, the data gets compressed coming in over NFS automatically without much fuss. Wrt to performance, can get about 870 MB/s per node if you have 10GigE attached (of course, with compression, the effective throughput will surpass that based on how good the data can be squeezed). On Fri, Apr 20, 2012 at 3:14 PM, Karl Hennig <[EMAIL PROTECTED]> wrote: > I am investigating automated methods of moving our data from the web tier > into HDFS for processing, a process that's performed periodically. > > I am looking for feedback from anyone who has actually used Flume in a > production setup (redundant, failover) successfully. I understand it is > now being largely rearchitected during its incubation as Apache Flume-NG, > so I don't have full confidence in the old, stable releases. > > The other option would be to write our own tools. What methods are you > using for these kinds of tasks? Did you write your own or does Flume (or > something else) work for you? > > I'm also on the Flume mailing list, but I wanted to ask these questions > here because I'm interested in Flume _and_ alternatives. > > Thank you! > >
-
Re: Feedback on real world production experience with Flumealo alt 2012-04-21, 15:42
We decided NO product and vendor advertising on apache mailing lists!
I do not understand why you'll put that closed source stuff from your employe in the room. It has nothing to do with flume or the use cases! -- Alexander Lorenz http://mapredit.blogspot.com On Apr 21, 2012, at 4:06 PM, M. C. Srivas wrote: > Karl, > > since you did ask for alternatives, people using MapR prefer to use the > NFS access to directly deposit data (or access it). Works seamlessly from > all Linuxes, Solaris, Windows, AIX and a myriad of other legacy systems > without having to load any agents on those machines. And it is fully > automatic HA > > Since compression is built-in in MapR, the data gets compressed coming in > over NFS automatically without much fuss. > > Wrt to performance, can get about 870 MB/s per node if you have 10GigE > attached (of course, with compression, the effective throughput will > surpass that based on how good the data can be squeezed). > > > On Fri, Apr 20, 2012 at 3:14 PM, Karl Hennig <[EMAIL PROTECTED]> wrote: > >> I am investigating automated methods of moving our data from the web tier >> into HDFS for processing, a process that's performed periodically. >> >> I am looking for feedback from anyone who has actually used Flume in a >> production setup (redundant, failover) successfully. I understand it is >> now being largely rearchitected during its incubation as Apache Flume-NG, >> so I don't have full confidence in the old, stable releases. >> >> The other option would be to write our own tools. What methods are you >> using for these kinds of tasks? Did you write your own or does Flume (or >> something else) work for you? >> >> I'm also on the Flume mailing list, but I wanted to ask these questions >> here because I'm interested in Flume _and_ alternatives. >> >> Thank you! >> >>
-
Re: Feedback on real world production experience with FlumeEdward Capriolo 2012-04-21, 22:23
It seems pretty relevant. If you can directly log via NFS that is a
viable alternative. On Sat, Apr 21, 2012 at 11:42 AM, alo alt <[EMAIL PROTECTED]> wrote: > We decided NO product and vendor advertising on apache mailing lists! > I do not understand why you'll put that closed source stuff from your employe in the room. It has nothing to do with flume or the use cases! > > -- > Alexander Lorenz > http://mapredit.blogspot.com > > On Apr 21, 2012, at 4:06 PM, M. C. Srivas wrote: > >> Karl, >> >> since you did ask for alternatives, people using MapR prefer to use the >> NFS access to directly deposit data (or access it). Works seamlessly from >> all Linuxes, Solaris, Windows, AIX and a myriad of other legacy systems >> without having to load any agents on those machines. And it is fully >> automatic HA >> >> Since compression is built-in in MapR, the data gets compressed coming in >> over NFS automatically without much fuss. >> >> Wrt to performance, can get about 870 MB/s per node if you have 10GigE >> attached (of course, with compression, the effective throughput will >> surpass that based on how good the data can be squeezed). >> >> >> On Fri, Apr 20, 2012 at 3:14 PM, Karl Hennig <[EMAIL PROTECTED]> wrote: >> >>> I am investigating automated methods of moving our data from the web tier >>> into HDFS for processing, a process that's performed periodically. >>> >>> I am looking for feedback from anyone who has actually used Flume in a >>> production setup (redundant, failover) successfully. I understand it is >>> now being largely rearchitected during its incubation as Apache Flume-NG, >>> so I don't have full confidence in the old, stable releases. >>> >>> The other option would be to write our own tools. What methods are you >>> using for these kinds of tasks? Did you write your own or does Flume (or >>> something else) work for you? >>> >>> I'm also on the Flume mailing list, but I wanted to ask these questions >>> here because I'm interested in Flume _and_ alternatives. >>> >>> Thank you! >>> >>> >
-
Re: Feedback on real world production experience with FlumeChen He 2012-04-21, 23:50
Can the NFS become the bottleneck ?
Chen On Sat, Apr 21, 2012 at 5:23 PM, Edward Capriolo <[EMAIL PROTECTED]>wrote: > It seems pretty relevant. If you can directly log via NFS that is a > viable alternative. > > On Sat, Apr 21, 2012 at 11:42 AM, alo alt <[EMAIL PROTECTED]> > wrote: > > We decided NO product and vendor advertising on apache mailing lists! > > I do not understand why you'll put that closed source stuff from your > employe in the room. It has nothing to do with flume or the use cases! > > > > -- > > Alexander Lorenz > > http://mapredit.blogspot.com > > > > On Apr 21, 2012, at 4:06 PM, M. C. Srivas wrote: > > > >> Karl, > >> > >> since you did ask for alternatives, people using MapR prefer to use the > >> NFS access to directly deposit data (or access it). Works seamlessly > from > >> all Linuxes, Solaris, Windows, AIX and a myriad of other legacy systems > >> without having to load any agents on those machines. And it is fully > >> automatic HA > >> > >> Since compression is built-in in MapR, the data gets compressed coming > in > >> over NFS automatically without much fuss. > >> > >> Wrt to performance, can get about 870 MB/s per node if you have 10GigE > >> attached (of course, with compression, the effective throughput will > >> surpass that based on how good the data can be squeezed). > >> > >> > >> On Fri, Apr 20, 2012 at 3:14 PM, Karl Hennig <[EMAIL PROTECTED]> > wrote: > >> > >>> I am investigating automated methods of moving our data from the web > tier > >>> into HDFS for processing, a process that's performed periodically. > >>> > >>> I am looking for feedback from anyone who has actually used Flume in a > >>> production setup (redundant, failover) successfully. I understand it > is > >>> now being largely rearchitected during its incubation as Apache > Flume-NG, > >>> so I don't have full confidence in the old, stable releases. > >>> > >>> The other option would be to write our own tools. What methods are you > >>> using for these kinds of tasks? Did you write your own or does Flume > (or > >>> something else) work for you? > >>> > >>> I'm also on the Flume mailing list, but I wanted to ask these questions > >>> here because I'm interested in Flume _and_ alternatives. > >>> > >>> Thank you! > >>> > >>> > > >
-
Re: Feedback on real world production experience with FlumeAlexander Lorenz 2012-04-22, 04:59
no. That is the Flume Open Source Mailinglist. Not a vendor list.
NFS logging has nothing to do with decentralized collectors like Flume, JMS or Scribe. sent via my mobile device On Apr 22, 2012, at 12:23 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote: > It seems pretty relevant. If you can directly log via NFS that is a > viable alternative. > > On Sat, Apr 21, 2012 at 11:42 AM, alo alt <[EMAIL PROTECTED]> wrote: >> We decided NO product and vendor advertising on apache mailing lists! >> I do not understand why you'll put that closed source stuff from your employe in the room. It has nothing to do with flume or the use cases! >> >> -- >> Alexander Lorenz >> http://mapredit.blogspot.com >> >> On Apr 21, 2012, at 4:06 PM, M. C. Srivas wrote: >> >>> Karl, >>> >>> since you did ask for alternatives, people using MapR prefer to use the >>> NFS access to directly deposit data (or access it). Works seamlessly from >>> all Linuxes, Solaris, Windows, AIX and a myriad of other legacy systems >>> without having to load any agents on those machines. And it is fully >>> automatic HA >>> >>> Since compression is built-in in MapR, the data gets compressed coming in >>> over NFS automatically without much fuss. >>> >>> Wrt to performance, can get about 870 MB/s per node if you have 10GigE >>> attached (of course, with compression, the effective throughput will >>> surpass that based on how good the data can be squeezed). >>> >>> >>> On Fri, Apr 20, 2012 at 3:14 PM, Karl Hennig <[EMAIL PROTECTED]> wrote: >>> >>>> I am investigating automated methods of moving our data from the web tier >>>> into HDFS for processing, a process that's performed periodically. >>>> >>>> I am looking for feedback from anyone who has actually used Flume in a >>>> production setup (redundant, failover) successfully. I understand it is >>>> now being largely rearchitected during its incubation as Apache Flume-NG, >>>> so I don't have full confidence in the old, stable releases. >>>> >>>> The other option would be to write our own tools. What methods are you >>>> using for these kinds of tasks? Did you write your own or does Flume (or >>>> something else) work for you? >>>> >>>> I'm also on the Flume mailing list, but I wanted to ask these questions >>>> here because I'm interested in Flume _and_ alternatives. >>>> >>>> Thank you! >>>> >>>> >>
-
Re: Feedback on real world production experience with FlumeEdward Capriolo 2012-04-22, 14:14
I think this is valid to talk about for example one need not need a
decentralized collector if they can just write log directly to decentralized files in a decentralized file system. In any case it was not even a hard vendor pitch. It was someone describing how they handle centralized logging. It stated facts and it was informative. Lets face it, if fuse-mounting-hdfs or directly soft mounting NFS in a way that performs well many of the use cases for flume and scribe like tools would be gone. (not all but many) I never knew there was a rule that discussing alternative software on a mailing list. It seems like a closed minded thing. I also doubt the ASF would back a rule like that. Are we not allowed to talk about EMR or S3, or am I not even allowed to mention S3? Can flume run on ec2 and log to S3? (oops party foul I guess I cant ask that.) Edward On Sun, Apr 22, 2012 at 12:59 AM, Alexander Lorenz <[EMAIL PROTECTED]> wrote: > no. That is the Flume Open Source Mailinglist. Not a vendor list. > > NFS logging has nothing to do with decentralized collectors like Flume, JMS or Scribe. > > sent via my mobile device > > On Apr 22, 2012, at 12:23 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote: > >> It seems pretty relevant. If you can directly log via NFS that is a >> viable alternative. >> >> On Sat, Apr 21, 2012 at 11:42 AM, alo alt <[EMAIL PROTECTED]> wrote: >>> We decided NO product and vendor advertising on apache mailing lists! >>> I do not understand why you'll put that closed source stuff from your employe in the room. It has nothing to do with flume or the use cases! >>> >>> -- >>> Alexander Lorenz >>> http://mapredit.blogspot.com >>> >>> On Apr 21, 2012, at 4:06 PM, M. C. Srivas wrote: >>> >>>> Karl, >>>> >>>> since you did ask for alternatives, people using MapR prefer to use the >>>> NFS access to directly deposit data (or access it). Works seamlessly from >>>> all Linuxes, Solaris, Windows, AIX and a myriad of other legacy systems >>>> without having to load any agents on those machines. And it is fully >>>> automatic HA >>>> >>>> Since compression is built-in in MapR, the data gets compressed coming in >>>> over NFS automatically without much fuss. >>>> >>>> Wrt to performance, can get about 870 MB/s per node if you have 10GigE >>>> attached (of course, with compression, the effective throughput will >>>> surpass that based on how good the data can be squeezed). >>>> >>>> >>>> On Fri, Apr 20, 2012 at 3:14 PM, Karl Hennig <[EMAIL PROTECTED]> wrote: >>>> >>>>> I am investigating automated methods of moving our data from the web tier >>>>> into HDFS for processing, a process that's performed periodically. >>>>> >>>>> I am looking for feedback from anyone who has actually used Flume in a >>>>> production setup (redundant, failover) successfully. I understand it is >>>>> now being largely rearchitected during its incubation as Apache Flume-NG, >>>>> so I don't have full confidence in the old, stable releases. >>>>> >>>>> The other option would be to write our own tools. What methods are you >>>>> using for these kinds of tasks? Did you write your own or does Flume (or >>>>> something else) work for you? >>>>> >>>>> I'm also on the Flume mailing list, but I wanted to ask these questions >>>>> here because I'm interested in Flume _and_ alternatives. >>>>> >>>>> Thank you! >>>>> >>>>> >>>
-
Re: Feedback on real world production experience with FlumeMichael Segel 2012-04-22, 15:34
Gee Edward, what about putting a link to a company website or your blog in your signature... ;-)
Seriously one could also mention fuse, right? ;-) Sent from my iPhone On Apr 22, 2012, at 7:15 AM, "Edward Capriolo" <[EMAIL PROTECTED]> wrote: > I think this is valid to talk about for example one need not need a > decentralized collector if they can just write log directly to > decentralized files in a decentralized file system. In any case it was > not even a hard vendor pitch. It was someone describing how they > handle centralized logging. It stated facts and it was informative. > > Lets face it, if fuse-mounting-hdfs or directly soft mounting NFS in a > way that performs well many of the use cases for flume and scribe like > tools would be gone. (not all but many) > > I never knew there was a rule that discussing alternative software on > a mailing list. It seems like a closed minded thing. I also doubt the > ASF would back a rule like that. Are we not allowed to talk about EMR > or S3, or am I not even allowed to mention S3? > > Can flume run on ec2 and log to S3? (oops party foul I guess I cant ask that.) > > Edward > > On Sun, Apr 22, 2012 at 12:59 AM, Alexander Lorenz > <[EMAIL PROTECTED]> wrote: >> no. That is the Flume Open Source Mailinglist. Not a vendor list. >> >> NFS logging has nothing to do with decentralized collectors like Flume, JMS or Scribe. >> >> sent via my mobile device >> >> On Apr 22, 2012, at 12:23 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote: >> >>> It seems pretty relevant. If you can directly log via NFS that is a >>> viable alternative. >>> >>> On Sat, Apr 21, 2012 at 11:42 AM, alo alt <[EMAIL PROTECTED]> wrote: >>>> We decided NO product and vendor advertising on apache mailing lists! >>>> I do not understand why you'll put that closed source stuff from your employe in the room. It has nothing to do with flume or the use cases! >>>> >>>> -- >>>> Alexander Lorenz >>>> http://mapredit.blogspot.com >>>> >>>> On Apr 21, 2012, at 4:06 PM, M. C. Srivas wrote: >>>> >>>>> Karl, >>>>> >>>>> since you did ask for alternatives, people using MapR prefer to use the >>>>> NFS access to directly deposit data (or access it). Works seamlessly from >>>>> all Linuxes, Solaris, Windows, AIX and a myriad of other legacy systems >>>>> without having to load any agents on those machines. And it is fully >>>>> automatic HA >>>>> >>>>> Since compression is built-in in MapR, the data gets compressed coming in >>>>> over NFS automatically without much fuss. >>>>> >>>>> Wrt to performance, can get about 870 MB/s per node if you have 10GigE >>>>> attached (of course, with compression, the effective throughput will >>>>> surpass that based on how good the data can be squeezed). >>>>> >>>>> >>>>> On Fri, Apr 20, 2012 at 3:14 PM, Karl Hennig <[EMAIL PROTECTED]> wrote: >>>>> >>>>>> I am investigating automated methods of moving our data from the web tier >>>>>> into HDFS for processing, a process that's performed periodically. >>>>>> >>>>>> I am looking for feedback from anyone who has actually used Flume in a >>>>>> production setup (redundant, failover) successfully. I understand it is >>>>>> now being largely rearchitected during its incubation as Apache Flume-NG, >>>>>> so I don't have full confidence in the old, stable releases. >>>>>> >>>>>> The other option would be to write our own tools. What methods are you >>>>>> using for these kinds of tasks? Did you write your own or does Flume (or >>>>>> something else) work for you? >>>>>> >>>>>> I'm also on the Flume mailing list, but I wanted to ask these questions >>>>>> here because I'm interested in Flume _and_ alternatives. >>>>>> >>>>>> Thank you! >>>>>> >>>>>> >>>>
-
Re: Feedback on real world production experience with FlumeBill Graham 2012-04-22, 16:23
+1 on Edward's comment.
The MapR comment was relevant and informative and the original poster never said he was only interested in open source options. On Sunday, April 22, 2012, Michael Segel wrote: > Gee Edward, what about putting a link to a company website or your blog in > your signature... ;-) > > Seriously one could also mention fuse, right? ;-) > > > Sent from my iPhone > > On Apr 22, 2012, at 7:15 AM, "Edward Capriolo" <[EMAIL PROTECTED]> > wrote: > > > I think this is valid to talk about for example one need not need a > > decentralized collector if they can just write log directly to > > decentralized files in a decentralized file system. In any case it was > > not even a hard vendor pitch. It was someone describing how they > > handle centralized logging. It stated facts and it was informative. > > > > Lets face it, if fuse-mounting-hdfs or directly soft mounting NFS in a > > way that performs well many of the use cases for flume and scribe like > > tools would be gone. (not all but many) > > > > I never knew there was a rule that discussing alternative software on > > a mailing list. It seems like a closed minded thing. I also doubt the > > ASF would back a rule like that. Are we not allowed to talk about EMR > > or S3, or am I not even allowed to mention S3? > > > > Can flume run on ec2 and log to S3? (oops party foul I guess I cant ask > that.) > > > > Edward > > > > On Sun, Apr 22, 2012 at 12:59 AM, Alexander Lorenz > > <[EMAIL PROTECTED]> wrote: > >> no. That is the Flume Open Source Mailinglist. Not a vendor list. > >> > >> NFS logging has nothing to do with decentralized collectors like Flume, > JMS or Scribe. > >> > >> sent via my mobile device > >> > >> On Apr 22, 2012, at 12:23 AM, Edward Capriolo <[EMAIL PROTECTED]> > wrote: > >> > >>> It seems pretty relevant. If you can directly log via NFS that is a > >>> viable alternative. > >>> > >>> On Sat, Apr 21, 2012 at 11:42 AM, alo alt <[EMAIL PROTECTED]> > wrote: > >>>> We decided NO product and vendor advertising on apache mailing lists! > >>>> I do not understand why you'll put that closed source stuff from your > employe in the room. It has nothing to do with flume or the use cases! > >>>> > >>>> -- > >>>> Alexander Lorenz > >>>> http://mapredit.blogspot.com > >>>> > >>>> On Apr 21, 2012, at 4:06 PM, M. C. Srivas wrote: > >>>> > >>>>> Karl, > >>>>> > >>>>> since you did ask for alternatives, people using MapR prefer to use > the > >>>>> NFS access to directly deposit data (or access it). Works > seamlessly from > >>>>> all Linuxes, Solaris, Windows, AIX and a myriad of other legacy > systems > >>>>> without having to load any agents on those machines. And it is fully > >>>>> automatic HA > >>>>> > >>>>> Since compression is built-in in MapR, the data gets compressed > coming in > >>>>> over NFS automatically without much fuss. > >>>>> > >>>>> Wrt to performance, can get about 870 MB/s per node if you have > 10GigE > >>>>> attached (of course, with compression, the effective throughput will > >>>>> surpass that based on how good the data can be squeezed). > >>>>> > >>>>> > >>>>> On Fri, Apr 20, 2012 at 3:14 PM, Karl Hennig <[EMAIL PROTECTED]> > wrote: > >>>>> > >>>>>> I am investigating automated methods of moving our data from the > web tier > >>>>>> into HDFS for processing, a process that's performed periodically. > >>>>>> > >>>>>> I am looking for feedback from anyone who has actually used Flume > in a > >>>>>> production setup (redundant, failover) successfully. I understand > it is > >>>>>> now being largely rearchitected during its incubation as Apache > Flume-NG, > >>>>>> so I don't have full confidence in the old, stable releases. > >>>>>> > >>>>>> The other option would be to write our own tools. What methods are > you > >>>>>> using for these kinds of tasks? Did you write your own or does > Flume (or > >>>>>> something else) work for you? > >>>>>> > >>>>>> I'm a -- *Note that I'm no longer using my Yahoo! email address. Please email me at [EMAIL PROTECTED] going forward.* |