Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # dev - Flume custom decorator for Rolling FileSink output bucketing


+
Dibyajyoti Ghosh 2013-03-15, 19:16
+
Mike Percy 2013-03-15, 19:38
+
Dibyajyoti Ghosh 2013-03-15, 21:20
+
Juhani Connolly 2013-03-18, 01:55
+
Dibyajyoti Ghosh 2013-03-18, 17:28
Copy link to this message
-
Re: Flume custom decorator for Rolling FileSink output bucketing
Juhani Connolly 2013-03-19, 03:15
At this time, I don't think anyone is working on that. I would like to
do it myself, but tbh, I have a lot of other stuff to deal with right
now, so don't see myself working on it any time in the near future.

However if someone was to post a patch I should be able to find the time
to review and commit it.

On 03/19/2013 02:28 AM, Dibyajyoti Ghosh wrote:
> Hi Juhani,
>
> Thank you very much for clarifying the doubts I had about the documentation
> for quite some time now. I downloaded the flume source from git and now
> looking into the HDFS sink code base. Like you said it will not be a small
> patch. Will keep the community posted about the changes.
>
> Are you aware of any plan to implement the output bucketting (i.e. dynamic
> paths) to FileRoll sink in near future releases of Flume?
>
> thanks a lot,
> - dib
>
>
> On Sun, Mar 17, 2013 at 6:55 PM, Juhani Connolly<
> [EMAIL PROTECTED]>  wrote:
>
>> Dib, that article is in reference to flume OG(0.95), it's not relevant to
>> the current release.
>>
>> I had looked in the past at fixing the file sink to use the same
>> bucketting available to the hdfs sink, but unfortunately it seemed like it
>> would take more than a quick fix. The PathManager currently only works with
>> one File at a time, and the rolling logic is connected to that. You'd
>> basically have to replace most of the logic, ideally reusing the bucketing
>> logic from the HDFS sink. As Mike said, you should probably just use the
>> HDFS sink with file:// unless you feel like improving the current sink.
>>
>>
>> On 03/16/2013 06:20 AM, Dibyajyoti Ghosh wrote:
>>
>>> Thanks Mike for the suggestion. The reason I am thinking of usual file
>>> system for log storage is to avoid latency issues for file retrieval as
>>> well as to allow users to scrape log files using grep / awk and multitude
>>> of other powerful commands available in conventional storage.
>>>
>>> I am now thinking of coming up with my own decorator classes for
>>> RollingFile sink. Any pointers on how I can get started on writing my
>>> custom decorators?
>>>
>>> Another quick question: Can you, Mike or somebody from flume community
>>> tell
>>> me how to use the commands documented here at:
>>> http://archive.cloudera.com/**cdh/3/flume/UserGuide/#_**
>>> introducing_sink_decorators<http://archive.cloudera.com/cdh/3/flume/UserGuide/#_introducing_sink_decorators>
>>>
>>>
>>> Is this available for flume-ng distributed with Cloudera solution i.e.
>>> flume 1.3.0?
>>>
>>> Best and thanks a lot again,
>>> - dib
>>>
>>>
>>> On Fri, Mar 15, 2013 at 12:38 PM, Mike Percy<[EMAIL PROTECTED]>  wrote:
>>>
>>>   Dib, you could use the HDFS sink with a file:// URL as an option.
>>>> Regards,
>>>> Mike
>>>>
>>>>
>>>>
>>>> On Fri, Mar 15, 2013 at 12:16 PM, Dibyajyoti Ghosh<
>>>> [EMAIL PROTECTED]>  wrote:
>>>>
>>>>   Dear flume team,
>>>>> I am using flume 1.3.0 bundled with Cloudera 4.2.0 distribution for log
>>>>>
>>>> to
>>>>
>>>>> local file system. But current implementation of FileSink doesn't have
>>>>> inline decorators like in HDFS Sink where output can be stored to
>>>>> directories based on event meta data e.g. hostname of the event or
>>>>> timestamp or some other attribute in the message object.
>>>>>
>>>>> How can I do the same for FileSink?
>>>>>
>>>>>
>>>>> Thanks a lot,
>>>>> - dib
>>>>>
>>>>>