1) If the source and sinks provided by the community are good enough for
you then don't invent yours. I think there are alot of work already done,
you can try those before writing your own source/sink.
2) For reliability you should be using file bases channel, which you
already plan to. For message failures I guess you need to handle it
yourself. You probably need to write some code there in order to catch the
failing message and routing them to some other channel, a more lenient one
like file or for the case of hbase you could write the whole failing
message in a different table.
You will also need to deal with duplicate messages if you are planning to
use a reliable channel (like file based channel).
On Wed, Jan 15, 2014 at 10:43 AM, AnilKumar B <[EMAIL PROTECTED]> wrote:
> I am planning to use file based channels.
> Thanks & Regards,
> B Anil Kumar.
> On Wed, Jan 15, 2014 at 3:12 PM, AnilKumar B <[EMAIL PROTECTED]>wrote:
>> In our pipeline we are thinking of using flume, our data source can be
>> either filer or hbase or it can be couchbase also and sink is either
>> filer(down stream's) or another hbase cluster(down stream's).
>> So I need some help in following.
>> 1) To handle multiple sources and sinks, do I need to write custom flume
>> sink and source? or I should use community's respective source and sinks?
>> 2) For us, we cannot miss any data, Is there any mechanism in flume to
>> handle failed messages, I mean suppose flume failed write the records into
>> hbase, how exactly it will takes care? Or should I maintain state of each
>> record and based on it's state I am thinking of handling failed messages,
>> Is that correct way? I am trying to use zookeeper for state management. So
>> just want to know, whether my approach is correct or not.
>> Thanks & Regards,
>> B Anil Kumar.
*Muhammad Ehsan ul Haque*
Norra Stationsgatan 61
SE-113 43 Stockholm
Tel: +46 (0)8- 120 120 00
Fax: +46 (0)8- 120 120 99