Flume is designed to transfer a continuous stream of events into hadoop.
It appears that in your use case each gzip file is a collection of events
that needs to be moved. The closest thing that i can see flume supporting
your use case is through the spooling directory source
... which has not yet been released.
On Mon, Oct 22, 2012 at 11:14 AM, Sadananda Hegde <[EMAIL PROTECTED]>wrote:
> Hi Harish,
> I am still exploring my options and that's part of my question too - which
> source should I be using.
> Currently I have set up my flume ng configuration to use exec source (exec
> source, file channel and hdfs sink); but can change to use a
> different source if it handles the compressed files.
> On Mon, Oct 22, 2012 at 10:27 AM, Harish Mandala <[EMAIL PROTECTED]>wrote:
>> Which of the flume sources are you trying to use?
>> On Mon, Oct 22, 2012 at 11:18 AM, Sadananda Hegde <[EMAIL PROTECTED]>wrote:
>>> My application servers produce data files that are in compressed format
>>> (gzip). I am planning to use flume ng (1.2.0) to collect those files and
>>> transfer them to hadoop cluster (write to HDFS). Is it possible to read and
>>> transfer them without uncomressing first? My sink would be HDFS and there
>>> are options to compress before writing to HDFS. That would work fine if my
>>> source is uncompressed text file and need to store hdfs file in compressed
>>> format. But in my case, the source itself is compressed. What would be the
>>> best options to handle such cases?
>>> Thanks for your help.