Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> Flume custom decorator for Rolling FileSink output bucketing

Copy link to this message
Re: Flume custom decorator for Rolling FileSink output bucketing
Dib, that article is in reference to flume OG(0.95), it's not relevant
to the current release.

I had looked in the past at fixing the file sink to use the same
bucketting available to the hdfs sink, but unfortunately it seemed like
it would take more than a quick fix. The PathManager currently only
works with one File at a time, and the rolling logic is connected to
that. You'd basically have to replace most of the logic, ideally reusing
the bucketing logic from the HDFS sink. As Mike said, you should
probably just use the HDFS sink with file:// unless you feel like
improving the current sink.

On 03/16/2013 06:20 AM, Dibyajyoti Ghosh wrote:
> Thanks Mike for the suggestion. The reason I am thinking of usual file
> system for log storage is to avoid latency issues for file retrieval as
> well as to allow users to scrape log files using grep / awk and multitude
> of other powerful commands available in conventional storage.
> I am now thinking of coming up with my own decorator classes for
> RollingFile sink. Any pointers on how I can get started on writing my
> custom decorators?
> Another quick question: Can you, Mike or somebody from flume community tell
> me how to use the commands documented here at:
> http://archive.cloudera.com/cdh/3/flume/UserGuide/#_introducing_sink_decorators
> Is this available for flume-ng distributed with Cloudera solution i.e.
> flume 1.3.0?
> Best and thanks a lot again,
> - dib
> On Fri, Mar 15, 2013 at 12:38 PM, Mike Percy <[EMAIL PROTECTED]> wrote:
>> Dib, you could use the HDFS sink with a file:// URL as an option.
>> Regards,
>> Mike
>> On Fri, Mar 15, 2013 at 12:16 PM, Dibyajyoti Ghosh <
>> [EMAIL PROTECTED]> wrote:
>>> Dear flume team,
>>> I am using flume 1.3.0 bundled with Cloudera 4.2.0 distribution for log
>> to
>>> local file system. But current implementation of FileSink doesn't have
>>> inline decorators like in HDFS Sink where output can be stored to
>>> directories based on event meta data e.g. hostname of the event or
>>> timestamp or some other attribute in the message object.
>>> How can I do the same for FileSink?
>>> Thanks a lot,
>>> - dib