Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - Can we treat a whole file as a Flume event?


+
Henry Ma 2013-01-22, 01:45
+
Nitin Pawar 2013-01-22, 05:22
+
Henry Ma 2013-01-22, 06:49
+
Nitin Pawar 2013-01-22, 07:37
+
Mike Percy 2013-01-22, 08:17
+
Roshan Naik 2013-01-22, 23:38
+
Mike Percy 2013-01-23, 02:39
Copy link to this message
-
Re: Can we treat a whole file as a Flume event?
Roshan Naik 2013-01-23, 05:23
Mike,
   Where is the SpoolingFileSource that you are referring to  ?

-roshan
On Tue, Jan 22, 2013 at 6:39 PM, Mike Percy <[EMAIL PROTECTED]> wrote:

> Hi Roshan,
> Yep in general I'd have concerns w.r.t. capacity planning and garbage
> collector behavior for large events. Flume holds at least one event batch
> in memory at once, depending on # of sources/sinks, and even with a batch
> size of 1 if you have unpredictably large events there is nothing
> preventing an OutOfMemoryError in extreme cases. But if you plan for
> capacity and test thoroughly then it can be made to work.
>
> Regards,
> Mike
>
>
> On Tue, Jan 22, 2013 at 3:38 PM, Roshan Naik <[EMAIL PROTECTED]>wrote:
>
>> i recall some discussion with regards to being cautious on the size of
>> the events (in this case the file being moved) as flume is not quite
>> intended for large events. Mike perhaps you can throw some light on that
>> aspect ?
>>
>>
>> On Tue, Jan 22, 2013 at 12:17 AM, Mike Percy <[EMAIL PROTECTED]> wrote:
>>
>>> Check out the latest changes to SpoolingFileSource w.r.t.
>>> EventDeserializers on trunk. You can deserialize a whole file that way if
>>> you want. Whether that is a good idea depends on your use case, though.
>>>
>>> It's on trunk, lacking user docs for the latest changes but I will try
>>> to hammer out updated docs soon. In the meantime, you can just look at the
>>> code and read the comments.
>>>
>>> Regards,
>>> Mike
>>>
>>> On Monday, January 21, 2013, Nitin Pawar wrote:
>>>
>>>> you cant configure it to send the entire file in an event unless you
>>>> have fixed number of events in each of the files. basically it reads the
>>>> entire file into a channel and then starts writing.
>>>>
>>>> so as long as you can limit the events in the file, i think you can
>>>> send entire file as a transaction but not as a single event
>>>> as long as I understand flume treats individual lines in the file as
>>>> event
>>>>
>>>> if you want to pull the entire file then you may want to implement that
>>>> with messaging queues where you send an event to activemq queue and then
>>>> your consumer may pull the file in one transaction with some other
>>>> mechanism like ftp or scp or something like that
>>>>
>>>> others will have better idea, i am just suggesting a crude way to get
>>>> the entire file as a single event
>>>>
>>>>
>>>> On Tue, Jan 22, 2013 at 12:19 PM, Henry Ma <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> As far as I know, Directory Spooling Source will send the file line by
>>>>> line as an event, and File Roll Sink will receive these lines and roll up
>>>>> to a big file by a fixed interval. Is it right, and can we config it to
>>>>> send the whole file as an event?
>>>>>
>>>>>
>>>>> On Tue, Jan 22, 2013 at 1:22 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>>>>>
>>>>>> why don't you use directory spooling ?
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 22, 2013 at 7:15 AM, Henry Ma <[EMAIL PROTECTED]>wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> When using Flume to collect log files, we want to just COPY the
>>>>>>> original files from several servers to a central storage (unix file
>>>>>>> system), not to roll up to a big file. Because we must record some messages
>>>>>>> of the original file such as name, host, path, timestamp, etc. Besides, we
>>>>>>> want to guarantee total reliability: no file miss, no file reduplicated.
>>>>>>>
>>>>>>> It seems that, in Source, we must put a whole file (size may be
>>>>>>> between 100KB and 100MB) into a Flume event; and in Sink, we must write
>>>>>>> each event to a single file.
>>>>>>>
>>>>>>> Is it practicable? Thanks!
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Henry Ma
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Nitin Pawar
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Henry Ma
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>
>>
>
+
Mike Percy 2013-01-23, 19:53
+
Roshan Naik 2013-01-23, 21:04
+
Mike Percy 2013-01-23, 21:18