|
Henry Ma
2013-01-22, 01:45
Nitin Pawar
2013-01-22, 05:22
Henry Ma
2013-01-22, 06:49
Nitin Pawar
2013-01-22, 07:37
Mike Percy
2013-01-22, 08:17
Roshan Naik
2013-01-22, 23:38
Mike Percy
2013-01-23, 02:39
Roshan Naik
2013-01-23, 05:23
Mike Percy
2013-01-23, 19:53
Roshan Naik
2013-01-23, 21:04
Mike Percy
2013-01-23, 21:18
|
-
Can we treat a whole file as a Flume event?Henry Ma 2013-01-22, 01:45
Hi,
When using Flume to collect log files, we want to just COPY the original files from several servers to a central storage (unix file system), not to roll up to a big file. Because we must record some messages of the original file such as name, host, path, timestamp, etc. Besides, we want to guarantee total reliability: no file miss, no file reduplicated. It seems that, in Source, we must put a whole file (size may be between 100KB and 100MB) into a Flume event; and in Sink, we must write each event to a single file. Is it practicable? Thanks! -- Best Regards, Henry Ma
-
Re: Can we treat a whole file as a Flume event?Nitin Pawar 2013-01-22, 05:22
why don't you use directory spooling ?
On Tue, Jan 22, 2013 at 7:15 AM, Henry Ma <[EMAIL PROTECTED]> wrote: > Hi, > > When using Flume to collect log files, we want to just COPY the original > files from several servers to a central storage (unix file system), not to > roll up to a big file. Because we must record some messages of the original > file such as name, host, path, timestamp, etc. Besides, we want to > guarantee total reliability: no file miss, no file reduplicated. > > It seems that, in Source, we must put a whole file (size may be between > 100KB and 100MB) into a Flume event; and in Sink, we must write each event > to a single file. > > Is it practicable? Thanks! > > -- > Best Regards, > Henry Ma > -- Nitin Pawar
-
Re: Can we treat a whole file as a Flume event?Henry Ma 2013-01-22, 06:49
As far as I know, Directory Spooling Source will send the file line by line
as an event, and File Roll Sink will receive these lines and roll up to a big file by a fixed interval. Is it right, and can we config it to send the whole file as an event? On Tue, Jan 22, 2013 at 1:22 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote: > why don't you use directory spooling ? > > > On Tue, Jan 22, 2013 at 7:15 AM, Henry Ma <[EMAIL PROTECTED]> wrote: > >> Hi, >> >> When using Flume to collect log files, we want to just COPY the original >> files from several servers to a central storage (unix file system), not to >> roll up to a big file. Because we must record some messages of the original >> file such as name, host, path, timestamp, etc. Besides, we want to >> guarantee total reliability: no file miss, no file reduplicated. >> >> It seems that, in Source, we must put a whole file (size may be between >> 100KB and 100MB) into a Flume event; and in Sink, we must write each event >> to a single file. >> >> Is it practicable? Thanks! >> >> -- >> Best Regards, >> Henry Ma >> > > > > -- > Nitin Pawar > -- Best Regards, Henry Ma
-
Re: Can we treat a whole file as a Flume event?Nitin Pawar 2013-01-22, 07:37
you cant configure it to send the entire file in an event unless you have
fixed number of events in each of the files. basically it reads the entire file into a channel and then starts writing. so as long as you can limit the events in the file, i think you can send entire file as a transaction but not as a single event as long as I understand flume treats individual lines in the file as event if you want to pull the entire file then you may want to implement that with messaging queues where you send an event to activemq queue and then your consumer may pull the file in one transaction with some other mechanism like ftp or scp or something like that others will have better idea, i am just suggesting a crude way to get the entire file as a single event On Tue, Jan 22, 2013 at 12:19 PM, Henry Ma <[EMAIL PROTECTED]> wrote: > As far as I know, Directory Spooling Source will send the file line by > line as an event, and File Roll Sink will receive these lines and roll up > to a big file by a fixed interval. Is it right, and can we config it to > send the whole file as an event? > > > On Tue, Jan 22, 2013 at 1:22 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote: > >> why don't you use directory spooling ? >> >> >> On Tue, Jan 22, 2013 at 7:15 AM, Henry Ma <[EMAIL PROTECTED]>wrote: >> >>> Hi, >>> >>> When using Flume to collect log files, we want to just COPY the original >>> files from several servers to a central storage (unix file system), not to >>> roll up to a big file. Because we must record some messages of the original >>> file such as name, host, path, timestamp, etc. Besides, we want to >>> guarantee total reliability: no file miss, no file reduplicated. >>> >>> It seems that, in Source, we must put a whole file (size may be between >>> 100KB and 100MB) into a Flume event; and in Sink, we must write each event >>> to a single file. >>> >>> Is it practicable? Thanks! >>> >>> -- >>> Best Regards, >>> Henry Ma >>> >> >> >> >> -- >> Nitin Pawar >> > > > > -- > Best Regards, > Henry Ma > -- Nitin Pawar
-
Re: Can we treat a whole file as a Flume event?Mike Percy 2013-01-22, 08:17
Check out the latest changes to SpoolingFileSource w.r.t.
EventDeserializers on trunk. You can deserialize a whole file that way if you want. Whether that is a good idea depends on your use case, though. It's on trunk, lacking user docs for the latest changes but I will try to hammer out updated docs soon. In the meantime, you can just look at the code and read the comments. Regards, Mike On Monday, January 21, 2013, Nitin Pawar wrote: > you cant configure it to send the entire file in an event unless you have > fixed number of events in each of the files. basically it reads the entire > file into a channel and then starts writing. > > so as long as you can limit the events in the file, i think you can send > entire file as a transaction but not as a single event > as long as I understand flume treats individual lines in the file as event > > if you want to pull the entire file then you may want to implement that > with messaging queues where you send an event to activemq queue and then > your consumer may pull the file in one transaction with some other > mechanism like ftp or scp or something like that > > others will have better idea, i am just suggesting a crude way to get the > entire file as a single event > > > On Tue, Jan 22, 2013 at 12:19 PM, Henry Ma <[EMAIL PROTECTED]<javascript:_e({}, 'cvml', '[EMAIL PROTECTED]');> > > wrote: > >> As far as I know, Directory Spooling Source will send the file line by >> line as an event, and File Roll Sink will receive these lines and roll up >> to a big file by a fixed interval. Is it right, and can we config it to >> send the whole file as an event? >> >> >> On Tue, Jan 22, 2013 at 1:22 PM, Nitin Pawar <[EMAIL PROTECTED]<javascript:_e({}, 'cvml', '[EMAIL PROTECTED]');> >> > wrote: >> >>> why don't you use directory spooling ? >>> >>> >>> On Tue, Jan 22, 2013 at 7:15 AM, Henry Ma <[EMAIL PROTECTED]<javascript:_e({}, 'cvml', '[EMAIL PROTECTED]');> >>> > wrote: >>> >>>> Hi, >>>> >>>> When using Flume to collect log files, we want to just COPY the >>>> original files from several servers to a central storage (unix file >>>> system), not to roll up to a big file. Because we must record some messages >>>> of the original file such as name, host, path, timestamp, etc. Besides, we >>>> want to guarantee total reliability: no file miss, no file reduplicated. >>>> >>>> It seems that, in Source, we must put a whole file (size may be between >>>> 100KB and 100MB) into a Flume event; and in Sink, we must write each event >>>> to a single file. >>>> >>>> Is it practicable? Thanks! >>>> >>>> -- >>>> Best Regards, >>>> Henry Ma >>>> >>> >>> >>> >>> -- >>> Nitin Pawar >>> >> >> >> >> -- >> Best Regards, >> Henry Ma >> > > > > -- > Nitin Pawar >
-
Re: Can we treat a whole file as a Flume event?Roshan Naik 2013-01-22, 23:38
i recall some discussion with regards to being cautious on the size of the
events (in this case the file being moved) as flume is not quite intended for large events. Mike perhaps you can throw some light on that aspect ? On Tue, Jan 22, 2013 at 12:17 AM, Mike Percy <[EMAIL PROTECTED]> wrote: > Check out the latest changes to SpoolingFileSource w.r.t. > EventDeserializers on trunk. You can deserialize a whole file that way if > you want. Whether that is a good idea depends on your use case, though. > > It's on trunk, lacking user docs for the latest changes but I will try to > hammer out updated docs soon. In the meantime, you can just look at the > code and read the comments. > > Regards, > Mike > > On Monday, January 21, 2013, Nitin Pawar wrote: > >> you cant configure it to send the entire file in an event unless you have >> fixed number of events in each of the files. basically it reads the entire >> file into a channel and then starts writing. >> >> so as long as you can limit the events in the file, i think you can send >> entire file as a transaction but not as a single event >> as long as I understand flume treats individual lines in the file as >> event >> >> if you want to pull the entire file then you may want to implement that >> with messaging queues where you send an event to activemq queue and then >> your consumer may pull the file in one transaction with some other >> mechanism like ftp or scp or something like that >> >> others will have better idea, i am just suggesting a crude way to get the >> entire file as a single event >> >> >> On Tue, Jan 22, 2013 at 12:19 PM, Henry Ma <[EMAIL PROTECTED]>wrote: >> >>> As far as I know, Directory Spooling Source will send the file line by >>> line as an event, and File Roll Sink will receive these lines and roll up >>> to a big file by a fixed interval. Is it right, and can we config it to >>> send the whole file as an event? >>> >>> >>> On Tue, Jan 22, 2013 at 1:22 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote: >>> >>>> why don't you use directory spooling ? >>>> >>>> >>>> On Tue, Jan 22, 2013 at 7:15 AM, Henry Ma <[EMAIL PROTECTED]>wrote: >>>> >>>>> Hi, >>>>> >>>>> When using Flume to collect log files, we want to just COPY the >>>>> original files from several servers to a central storage (unix file >>>>> system), not to roll up to a big file. Because we must record some messages >>>>> of the original file such as name, host, path, timestamp, etc. Besides, we >>>>> want to guarantee total reliability: no file miss, no file reduplicated. >>>>> >>>>> It seems that, in Source, we must put a whole file (size may be >>>>> between 100KB and 100MB) into a Flume event; and in Sink, we must write >>>>> each event to a single file. >>>>> >>>>> Is it practicable? Thanks! >>>>> >>>>> -- >>>>> Best Regards, >>>>> Henry Ma >>>>> >>>> >>>> >>>> >>>> -- >>>> Nitin Pawar >>>> >>> >>> >>> >>> -- >>> Best Regards, >>> Henry Ma >>> >> >> >> >> -- >> Nitin Pawar >> >
-
Re: Can we treat a whole file as a Flume event?Mike Percy 2013-01-23, 02:39
Hi Roshan,
Yep in general I'd have concerns w.r.t. capacity planning and garbage collector behavior for large events. Flume holds at least one event batch in memory at once, depending on # of sources/sinks, and even with a batch size of 1 if you have unpredictably large events there is nothing preventing an OutOfMemoryError in extreme cases. But if you plan for capacity and test thoroughly then it can be made to work. Regards, Mike On Tue, Jan 22, 2013 at 3:38 PM, Roshan Naik <[EMAIL PROTECTED]> wrote: > i recall some discussion with regards to being cautious on the size of the > events (in this case the file being moved) as flume is not quite intended > for large events. Mike perhaps you can throw some light on that aspect ? > > > On Tue, Jan 22, 2013 at 12:17 AM, Mike Percy <[EMAIL PROTECTED]> wrote: > >> Check out the latest changes to SpoolingFileSource w.r.t. >> EventDeserializers on trunk. You can deserialize a whole file that way if >> you want. Whether that is a good idea depends on your use case, though. >> >> It's on trunk, lacking user docs for the latest changes but I will try to >> hammer out updated docs soon. In the meantime, you can just look at the >> code and read the comments. >> >> Regards, >> Mike >> >> On Monday, January 21, 2013, Nitin Pawar wrote: >> >>> you cant configure it to send the entire file in an event unless you >>> have fixed number of events in each of the files. basically it reads the >>> entire file into a channel and then starts writing. >>> >>> so as long as you can limit the events in the file, i think you can send >>> entire file as a transaction but not as a single event >>> as long as I understand flume treats individual lines in the file as >>> event >>> >>> if you want to pull the entire file then you may want to implement that >>> with messaging queues where you send an event to activemq queue and then >>> your consumer may pull the file in one transaction with some other >>> mechanism like ftp or scp or something like that >>> >>> others will have better idea, i am just suggesting a crude way to get >>> the entire file as a single event >>> >>> >>> On Tue, Jan 22, 2013 at 12:19 PM, Henry Ma <[EMAIL PROTECTED]>wrote: >>> >>>> As far as I know, Directory Spooling Source will send the file line by >>>> line as an event, and File Roll Sink will receive these lines and roll up >>>> to a big file by a fixed interval. Is it right, and can we config it to >>>> send the whole file as an event? >>>> >>>> >>>> On Tue, Jan 22, 2013 at 1:22 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote: >>>> >>>>> why don't you use directory spooling ? >>>>> >>>>> >>>>> On Tue, Jan 22, 2013 at 7:15 AM, Henry Ma <[EMAIL PROTECTED]>wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> When using Flume to collect log files, we want to just COPY the >>>>>> original files from several servers to a central storage (unix file >>>>>> system), not to roll up to a big file. Because we must record some messages >>>>>> of the original file such as name, host, path, timestamp, etc. Besides, we >>>>>> want to guarantee total reliability: no file miss, no file reduplicated. >>>>>> >>>>>> It seems that, in Source, we must put a whole file (size may be >>>>>> between 100KB and 100MB) into a Flume event; and in Sink, we must write >>>>>> each event to a single file. >>>>>> >>>>>> Is it practicable? Thanks! >>>>>> >>>>>> -- >>>>>> Best Regards, >>>>>> Henry Ma >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Nitin Pawar >>>>> >>>> >>>> >>>> >>>> -- >>>> Best Regards, >>>> Henry Ma >>>> >>> >>> >>> >>> -- >>> Nitin Pawar >>> >> >
-
Re: Can we treat a whole file as a Flume event?Roshan Naik 2013-01-23, 05:23
Mike,
Where is the SpoolingFileSource that you are referring to ? -roshan On Tue, Jan 22, 2013 at 6:39 PM, Mike Percy <[EMAIL PROTECTED]> wrote: > Hi Roshan, > Yep in general I'd have concerns w.r.t. capacity planning and garbage > collector behavior for large events. Flume holds at least one event batch > in memory at once, depending on # of sources/sinks, and even with a batch > size of 1 if you have unpredictably large events there is nothing > preventing an OutOfMemoryError in extreme cases. But if you plan for > capacity and test thoroughly then it can be made to work. > > Regards, > Mike > > > On Tue, Jan 22, 2013 at 3:38 PM, Roshan Naik <[EMAIL PROTECTED]>wrote: > >> i recall some discussion with regards to being cautious on the size of >> the events (in this case the file being moved) as flume is not quite >> intended for large events. Mike perhaps you can throw some light on that >> aspect ? >> >> >> On Tue, Jan 22, 2013 at 12:17 AM, Mike Percy <[EMAIL PROTECTED]> wrote: >> >>> Check out the latest changes to SpoolingFileSource w.r.t. >>> EventDeserializers on trunk. You can deserialize a whole file that way if >>> you want. Whether that is a good idea depends on your use case, though. >>> >>> It's on trunk, lacking user docs for the latest changes but I will try >>> to hammer out updated docs soon. In the meantime, you can just look at the >>> code and read the comments. >>> >>> Regards, >>> Mike >>> >>> On Monday, January 21, 2013, Nitin Pawar wrote: >>> >>>> you cant configure it to send the entire file in an event unless you >>>> have fixed number of events in each of the files. basically it reads the >>>> entire file into a channel and then starts writing. >>>> >>>> so as long as you can limit the events in the file, i think you can >>>> send entire file as a transaction but not as a single event >>>> as long as I understand flume treats individual lines in the file as >>>> event >>>> >>>> if you want to pull the entire file then you may want to implement that >>>> with messaging queues where you send an event to activemq queue and then >>>> your consumer may pull the file in one transaction with some other >>>> mechanism like ftp or scp or something like that >>>> >>>> others will have better idea, i am just suggesting a crude way to get >>>> the entire file as a single event >>>> >>>> >>>> On Tue, Jan 22, 2013 at 12:19 PM, Henry Ma <[EMAIL PROTECTED]>wrote: >>>> >>>>> As far as I know, Directory Spooling Source will send the file line by >>>>> line as an event, and File Roll Sink will receive these lines and roll up >>>>> to a big file by a fixed interval. Is it right, and can we config it to >>>>> send the whole file as an event? >>>>> >>>>> >>>>> On Tue, Jan 22, 2013 at 1:22 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote: >>>>> >>>>>> why don't you use directory spooling ? >>>>>> >>>>>> >>>>>> On Tue, Jan 22, 2013 at 7:15 AM, Henry Ma <[EMAIL PROTECTED]>wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> When using Flume to collect log files, we want to just COPY the >>>>>>> original files from several servers to a central storage (unix file >>>>>>> system), not to roll up to a big file. Because we must record some messages >>>>>>> of the original file such as name, host, path, timestamp, etc. Besides, we >>>>>>> want to guarantee total reliability: no file miss, no file reduplicated. >>>>>>> >>>>>>> It seems that, in Source, we must put a whole file (size may be >>>>>>> between 100KB and 100MB) into a Flume event; and in Sink, we must write >>>>>>> each event to a single file. >>>>>>> >>>>>>> Is it practicable? Thanks! >>>>>>> >>>>>>> -- >>>>>>> Best Regards, >>>>>>> Henry Ma >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Nitin Pawar >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards, >>>>> Henry Ma >>>>> >>>> >>>> >>>> >>>> -- >>>> Nitin Pawar >>>> >>> >> >
-
Re: Can we treat a whole file as a Flume event?Mike Percy 2013-01-23, 19:53
https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java
On Tue, Jan 22, 2013 at 9:23 PM, Roshan Naik <[EMAIL PROTECTED]> wrote: > Mike, > Where is the SpoolingFileSource that you are referring to ? > > -roshan > > > On Tue, Jan 22, 2013 at 6:39 PM, Mike Percy <[EMAIL PROTECTED]> wrote: > >> Hi Roshan, >> Yep in general I'd have concerns w.r.t. capacity planning and garbage >> collector behavior for large events. Flume holds at least one event batch >> in memory at once, depending on # of sources/sinks, and even with a batch >> size of 1 if you have unpredictably large events there is nothing >> preventing an OutOfMemoryError in extreme cases. But if you plan for >> capacity and test thoroughly then it can be made to work. >> >> Regards, >> Mike >> >> >> On Tue, Jan 22, 2013 at 3:38 PM, Roshan Naik <[EMAIL PROTECTED]>wrote: >> >>> i recall some discussion with regards to being cautious on the size of >>> the events (in this case the file being moved) as flume is not quite >>> intended for large events. Mike perhaps you can throw some light on that >>> aspect ? >>> >>> >>> On Tue, Jan 22, 2013 at 12:17 AM, Mike Percy <[EMAIL PROTECTED]> wrote: >>> >>>> Check out the latest changes to SpoolingFileSource w.r.t. >>>> EventDeserializers on trunk. You can deserialize a whole file that way if >>>> you want. Whether that is a good idea depends on your use case, though. >>>> >>>> It's on trunk, lacking user docs for the latest changes but I will try >>>> to hammer out updated docs soon. In the meantime, you can just look at the >>>> code and read the comments. >>>> >>>> Regards, >>>> Mike >>>> >>>> On Monday, January 21, 2013, Nitin Pawar wrote: >>>> >>>>> you cant configure it to send the entire file in an event unless you >>>>> have fixed number of events in each of the files. basically it reads the >>>>> entire file into a channel and then starts writing. >>>>> >>>>> so as long as you can limit the events in the file, i think you can >>>>> send entire file as a transaction but not as a single event >>>>> as long as I understand flume treats individual lines in the file as >>>>> event >>>>> >>>>> if you want to pull the entire file then you may want to implement >>>>> that with messaging queues where you send an event to activemq queue and >>>>> then your consumer may pull the file in one transaction with some other >>>>> mechanism like ftp or scp or something like that >>>>> >>>>> others will have better idea, i am just suggesting a crude way to get >>>>> the entire file as a single event >>>>> >>>>> >>>>> On Tue, Jan 22, 2013 at 12:19 PM, Henry Ma <[EMAIL PROTECTED]>wrote: >>>>> >>>>>> As far as I know, Directory Spooling Source will send the file line >>>>>> by line as an event, and File Roll Sink will receive these lines and roll >>>>>> up to a big file by a fixed interval. Is it right, and can we config it to >>>>>> send the whole file as an event? >>>>>> >>>>>> >>>>>> On Tue, Jan 22, 2013 at 1:22 PM, Nitin Pawar <[EMAIL PROTECTED] >>>>>> > wrote: >>>>>> >>>>>>> why don't you use directory spooling ? >>>>>>> >>>>>>> >>>>>>> On Tue, Jan 22, 2013 at 7:15 AM, Henry Ma <[EMAIL PROTECTED]>wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> When using Flume to collect log files, we want to just COPY the >>>>>>>> original files from several servers to a central storage (unix file >>>>>>>> system), not to roll up to a big file. Because we must record some messages >>>>>>>> of the original file such as name, host, path, timestamp, etc. Besides, we >>>>>>>> want to guarantee total reliability: no file miss, no file reduplicated. >>>>>>>> >>>>>>>> It seems that, in Source, we must put a whole file (size may be >>>>>>>> between 100KB and 100MB) into a Flume event; and in Sink, we must write >>>>>>>> each event to a single file. >>>>>>>> >>>>>>>> Is it practicable? Thanks! >>>>>>>> >>>>>>>> -- >>>>>>>> Best Regards, >>>>>>>> Henry Ma >>>>>>>> >>>>>>>
-
Re: Can we treat a whole file as a Flume event?Roshan Naik 2013-01-23, 21:04
Thats SpoolDirectorySource.java .. i thought you referred to
SpoolingFileSource earlier. i assume that was a typo ? On Wed, Jan 23, 2013 at 11:53 AM, Mike Percy <[EMAIL PROTECTED]> wrote: > > https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java > > > On Tue, Jan 22, 2013 at 9:23 PM, Roshan Naik <[EMAIL PROTECTED]>wrote: > >> Mike, >> Where is the SpoolingFileSource that you are referring to ? >> >> -roshan >> >> >> On Tue, Jan 22, 2013 at 6:39 PM, Mike Percy <[EMAIL PROTECTED]> wrote: >> >>> Hi Roshan, >>> Yep in general I'd have concerns w.r.t. capacity planning and garbage >>> collector behavior for large events. Flume holds at least one event batch >>> in memory at once, depending on # of sources/sinks, and even with a batch >>> size of 1 if you have unpredictably large events there is nothing >>> preventing an OutOfMemoryError in extreme cases. But if you plan for >>> capacity and test thoroughly then it can be made to work. >>> >>> Regards, >>> Mike >>> >>> >>> On Tue, Jan 22, 2013 at 3:38 PM, Roshan Naik <[EMAIL PROTECTED]>wrote: >>> >>>> i recall some discussion with regards to being cautious on the size of >>>> the events (in this case the file being moved) as flume is not quite >>>> intended for large events. Mike perhaps you can throw some light on that >>>> aspect ? >>>> >>>> >>>> On Tue, Jan 22, 2013 at 12:17 AM, Mike Percy <[EMAIL PROTECTED]> wrote: >>>> >>>>> Check out the latest changes to SpoolingFileSource w.r.t. >>>>> EventDeserializers on trunk. You can deserialize a whole file that way if >>>>> you want. Whether that is a good idea depends on your use case, though. >>>>> >>>>> It's on trunk, lacking user docs for the latest changes but I will try >>>>> to hammer out updated docs soon. In the meantime, you can just look at the >>>>> code and read the comments. >>>>> >>>>> Regards, >>>>> Mike >>>>> >>>>> On Monday, January 21, 2013, Nitin Pawar wrote: >>>>> >>>>>> you cant configure it to send the entire file in an event unless you >>>>>> have fixed number of events in each of the files. basically it reads the >>>>>> entire file into a channel and then starts writing. >>>>>> >>>>>> so as long as you can limit the events in the file, i think you can >>>>>> send entire file as a transaction but not as a single event >>>>>> as long as I understand flume treats individual lines in the file as >>>>>> event >>>>>> >>>>>> if you want to pull the entire file then you may want to implement >>>>>> that with messaging queues where you send an event to activemq queue and >>>>>> then your consumer may pull the file in one transaction with some other >>>>>> mechanism like ftp or scp or something like that >>>>>> >>>>>> others will have better idea, i am just suggesting a crude way to get >>>>>> the entire file as a single event >>>>>> >>>>>> >>>>>> On Tue, Jan 22, 2013 at 12:19 PM, Henry Ma <[EMAIL PROTECTED]>wrote: >>>>>> >>>>>>> As far as I know, Directory Spooling Source will send the file line >>>>>>> by line as an event, and File Roll Sink will receive these lines and roll >>>>>>> up to a big file by a fixed interval. Is it right, and can we config it to >>>>>>> send the whole file as an event? >>>>>>> >>>>>>> >>>>>>> On Tue, Jan 22, 2013 at 1:22 PM, Nitin Pawar < >>>>>>> [EMAIL PROTECTED]> wrote: >>>>>>> >>>>>>>> why don't you use directory spooling ? >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jan 22, 2013 at 7:15 AM, Henry Ma <[EMAIL PROTECTED]>wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> When using Flume to collect log files, we want to just COPY the >>>>>>>>> original files from several servers to a central storage (unix file >>>>>>>>> system), not to roll up to a big file. Because we must record some messages >>>>>>>>> of the original file such as name, host, path, timestamp, etc. Besides, we >>>>>>>>> want to guarantee total reliability: no file miss, no file reduplicated. >>>>>>>>> >>>>>>>>> It seems that, in Source, we must put a whole file (size may be
-
Re: Can we treat a whole file as a Flume event?Mike Percy 2013-01-23, 21:18
Yep my bad, typo :)
On Wed, Jan 23, 2013 at 1:04 PM, Roshan Naik <[EMAIL PROTECTED]> wrote: > Thats SpoolDirectorySource.java .. i thought you referred to SpoolingFileSource > earlier. i assume that was a typo ? > > > On Wed, Jan 23, 2013 at 11:53 AM, Mike Percy <[EMAIL PROTECTED]> wrote: > >> >> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java >> >> >> On Tue, Jan 22, 2013 at 9:23 PM, Roshan Naik <[EMAIL PROTECTED]>wrote: >> >>> Mike, >>> Where is the SpoolingFileSource that you are referring to ? >>> >>> -roshan >>> >>> >>> On Tue, Jan 22, 2013 at 6:39 PM, Mike Percy <[EMAIL PROTECTED]> wrote: >>> >>>> Hi Roshan, >>>> Yep in general I'd have concerns w.r.t. capacity planning and garbage >>>> collector behavior for large events. Flume holds at least one event batch >>>> in memory at once, depending on # of sources/sinks, and even with a batch >>>> size of 1 if you have unpredictably large events there is nothing >>>> preventing an OutOfMemoryError in extreme cases. But if you plan for >>>> capacity and test thoroughly then it can be made to work. >>>> >>>> Regards, >>>> Mike >>>> >>>> >>>> On Tue, Jan 22, 2013 at 3:38 PM, Roshan Naik <[EMAIL PROTECTED]>wrote: >>>> >>>>> i recall some discussion with regards to being cautious on the size of >>>>> the events (in this case the file being moved) as flume is not quite >>>>> intended for large events. Mike perhaps you can throw some light on that >>>>> aspect ? >>>>> >>>>> >>>>> On Tue, Jan 22, 2013 at 12:17 AM, Mike Percy <[EMAIL PROTECTED]>wrote: >>>>> >>>>>> Check out the latest changes to SpoolingFileSource w.r.t. >>>>>> EventDeserializers on trunk. You can deserialize a whole file that way if >>>>>> you want. Whether that is a good idea depends on your use case, though. >>>>>> >>>>>> It's on trunk, lacking user docs for the latest changes but I will >>>>>> try to hammer out updated docs soon. In the meantime, you can just look at >>>>>> the code and read the comments. >>>>>> >>>>>> Regards, >>>>>> Mike >>>>>> >>>>>> On Monday, January 21, 2013, Nitin Pawar wrote: >>>>>> >>>>>>> you cant configure it to send the entire file in an event unless you >>>>>>> have fixed number of events in each of the files. basically it reads the >>>>>>> entire file into a channel and then starts writing. >>>>>>> >>>>>>> so as long as you can limit the events in the file, i think you can >>>>>>> send entire file as a transaction but not as a single event >>>>>>> as long as I understand flume treats individual lines in the file as >>>>>>> event >>>>>>> >>>>>>> if you want to pull the entire file then you may want to implement >>>>>>> that with messaging queues where you send an event to activemq queue and >>>>>>> then your consumer may pull the file in one transaction with some other >>>>>>> mechanism like ftp or scp or something like that >>>>>>> >>>>>>> others will have better idea, i am just suggesting a crude way to >>>>>>> get the entire file as a single event >>>>>>> >>>>>>> >>>>>>> On Tue, Jan 22, 2013 at 12:19 PM, Henry Ma <[EMAIL PROTECTED]>wrote: >>>>>>> >>>>>>>> As far as I know, Directory Spooling Source will send the file line >>>>>>>> by line as an event, and File Roll Sink will receive these lines and roll >>>>>>>> up to a big file by a fixed interval. Is it right, and can we config it to >>>>>>>> send the whole file as an event? >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jan 22, 2013 at 1:22 PM, Nitin Pawar < >>>>>>>> [EMAIL PROTECTED]> wrote: >>>>>>>> >>>>>>>>> why don't you use directory spooling ? >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Jan 22, 2013 at 7:15 AM, Henry Ma <[EMAIL PROTECTED] >>>>>>>>> > wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> When using Flume to collect log files, we want to just COPY the >>>>>>>>>> original files from several servers to a central storage (unix file >>>>>>>>>> system), not to roll up to a big file. Because we must record some messages |