Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Re: is it possible to commit offsets on a per stream basis?


Copy link to this message
-
Re: is it possible to commit offsets on a per stream basis?
Jason Rosenberg 2013-09-08, 03:33
To be clear, it looks like I forgot to add to my question, that I am asking
about creating multiple connectors, within the same consumer process (as I
realize I can obviously have multiple connectors running on multiple hosts,
etc.).  But I'm guessing that should be fine too?

Jason
On Sat, Sep 7, 2013 at 3:09 PM, Neha Narkhede <[EMAIL PROTECTED]>wrote:

> >> Can I create multiple connectors, and have each use the same Regex
> for the TopicFilter?  Will each connector share the set of available
> topics?  Is this safe to do?
>
> >> Or is it necessary to create mutually non-intersecting regex's for each
> connector?
>
> As long as each of those consumer connectors share the same group id, Kafka
> consumer rebalancing should automatically re-distribute the
> topic/partitions amongst the consumer connectors/streams evenly.
>
> Thanks,
> Neha
>
>
> On Mon, Sep 2, 2013 at 1:35 PM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:
>
> > Will this work if we are using a TopicFilter, that can map to multiple
> > topics.  Can I create multiple connectors, and have each use the same
> Regex
> > for the TopicFilter?  Will each connector share the set of available
> > topics?  Is this safe to do?
> >
> > Or is it necessary to create mutually non-intersecting regex's for each
> > connector?
> >
> > It seems I have a similar issue.  I have been using auto commit mode, but
> > it doesn't guarantee that all messages committed have been successfully
> > processed (seems a change to the connector itself might expose a way to
> use
> > auto offset commit, and have it never commit a message until it is
> > processed).  But that would be a change to the
> > ZookeeperConsumerConnector....Essentially, it would be great if after
> > processing each message, we could mark the message as 'processed', and
> thus
> > use that status as the max offset to commit when the auto offset commit
> > background thread wakes up each time.
> >
> > Jason
> >
> >
> > On Thu, Aug 29, 2013 at 11:58 AM, Yu, Libo <[EMAIL PROTECTED]> wrote:
> >
> > > Thanks, Neha. That is a great answer.
> > >
> > > Regards,
> > >
> > > Libo
> > >
> > >
> > > -----Original Message-----
> > > From: Neha Narkhede [mailto:[EMAIL PROTECTED]]
> > > Sent: Thursday, August 29, 2013 1:55 PM
> > > To: [EMAIL PROTECTED]
> > > Subject: Re: is it possible to commit offsets on a per stream basis?
> > >
> > > 1 We can create multiple connectors. From each connector create only
> one
> > > stream.
> > > 2 Use a single thread for a stream. In this case, the connector in each
> > > thread can commit freely without any dependence on the other threads.
>  Is
> > > this the right way to go? Will it introduce any dead lock when multiple
> > > connectors commit at the same time?
> > >
> > > This is a better approach as there is no complex locking involved.
> > >
> > > Thanks,
> > > Neha
> > >
> > >
> > > On Thu, Aug 29, 2013 at 10:28 AM, Yu, Libo <[EMAIL PROTECTED]> wrote:
> > >
> > > > Hi team,
> > > >
> > > > This is our current use case:
> > > > Assume there is a topic with multiple partitions.
> > > > 1 Create a connector first and create multiple streams from the
> > > > connector for a topic.
> > > > 2 Create multiple threads, one for each stream. You can assume the
> > > > thread's job is to save the message into the database.
> > > > 3 When it is time to commit offsets, all threads have to synchronize
> > > > on a barrier before committing the offsets. This is to ensure no
> > > > message loss in case of process crash.
> > > >
> > > > As all threads need to synchronize before committing, it is not
> > > efficient.
> > > > This is a workaround:
> > > >
> > > > 1 We can create multiple connectors. From each connector create only
> > > > one stream.
> > > > 2 Use a single thread for a stream. In this case, the connector in
> > > > each thread can commit freely without any dependence on the other
> > > > threads.  Is this the right way to go? Will it introduce any dead
> lock
> > > > when multiple connectors commit at the same time?