Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # dev >> Re: [jira] [Issue Comment Deleted] (FLUME-2173) Exactly once semantics for Flume


Copy link to this message
-
Re: [jira] [Issue Comment Deleted] (FLUME-2173) Exactly once semantics for Flume
Hi Hari,
 

I deleted my comment (again). The mailing list is probably a better avenue
to discuss this ­ sorry about that! :)

I can find at least one other way duplicate events can occur, and so what
I provided helps to reduce duplicate events but is not sufficient to
guaranty exactly once semantics. However, I still think that using a
2-phase commit when writing to multiple channels would benefit Flume. This
should probably be a different ticket though.

Concerning the algorithm you offered, the case of replicating channel
selector should probably be handled, by creating a new UUID for each
duplicate message.
I hope this helps.
Regards,

Gabriel
On 8/25/13 7:27 AM, "Gabriel Commeau (JIRA)" <[EMAIL PROTECTED]> wrote:

>
>     [
>https://issues.apache.org/jira/browse/FLUME-2173?page=com.atlassian.jira.p
>lugin.system.issuetabpanels:all-tabpanel ]
>
>Gabriel Commeau updated FLUME-2173:
>-----------------------------------
>
>    Comment: was deleted
>
>(was: I would approach the problem from a different angle. The way I see
>it, there are two main places where duplicates can occur: when using
>multiple channels for one source (using a replication channel selector),
>and when the "output" of a sink cannot guaranty whether the event has
>truly been committed or not (as you pointed out for example, HDFS writing
>the event but throwing an exception).
>Actually, I don¹t think there is a general solution to the problem of
>output systems (e.g. HDFS) that do not guaranty whether the event is
>truly committed or not, because we¹d need to enforce this requirement on
>3rd party systems (relative to Flume). I see it as a problem to be solved
>on a case-by-case basis for each sink.
>
>However, I would like to suggest a solution to the first problem. Here is
>an example to illustrate it: Pretend an agent has a source that writes to
>two (required) channels. As part of a transaction, the channel processor
>will commit to the first channel, which succeeds, and then to the second
>channel, which fails. The whole transaction will fail, but the event has
>already been committed once to the first channel. When the transaction is
>retried, the event will be duplicated.
>The solution I discussed a few months back with Mike P. was to use a
>two-phase commit when writing to channels. This insures that the events
>are not actually committed to a channel if the following ones fail. This
>however will require an API change on the Channel interface. I would
>suggest adding a preparePut method returning a boolean, which would be
>the ³voting² phase. The put method becomes the commit phase. To make it
>backward compatible, we'd implement preparePut to always return true in
>the AbstractChannel.
>
>I hope this helps.
>)
>    
>> Exactly once semantics for Flume
>> --------------------------------
>>
>>                 Key: FLUME-2173
>>                 URL: https://issues.apache.org/jira/browse/FLUME-2173
>>             Project: Flume
>>          Issue Type: Bug
>>            Reporter: Hari Shreedharan
>>            Assignee: Hari Shreedharan
>>
>> Currently Flume guarantees only at least once semantics. This jira is
>>meant to track exactly once semantics for Flume. My initial idea is to
>>include uuid event ids on events at the original source (use a config to
>>mark a source an original source) and identify destination sinks. At the
>>destination sinks, use a unique ZK Znode to track the events. If once
>>seen (and configured), pull the duplicate out.
>> This might need some refactoring, but my belief is we can do this in a
>>backward compatible way.
>
>--
>This message is automatically generated by JIRA.
>If you think it was sent incorrectly, please contact your JIRA
>administrators
>For more information on JIRA, see: http://www.atlassian.com/software/jira
+
Hari Shreedharan 2013-08-25, 16:07
+
Arvind Prabhakar 2013-08-27, 21:12
+
Hari Shreedharan 2013-08-28, 00:35
+
Hari Shreedharan 2013-08-28, 00:56
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB