Hello All,
I can think of two options of implementing below requirement and request
some guidance on choosing the option with pros and cons.

Requirements:
- A in memory rules cache to be loaded from log compacted kafka topic. This
cache has to be loaded prior to arrival of events.
- Updates to the log compacted kafka topic has to be tracked to keep the in
memory rule cache up to date

Additional properties of data:
- On Job start/restart, this rule cache is always loaded from earliest
available offset in the log. - No kafka offset store and restore required.
- No checkpointing needed for the rule cache, as it is loaded afresh in
event of crash and restore
- No eventTime semantics required as we always want the latest rules to be
loaded to cache

Implementation Options:

1. Using a KafkaConsumer in open() doing a initial load, and continuously
fetching rule updates and keeping the in memory cache up to date. This
option is not using a DataStream for rules as we don't use any goodies of
stream like state,checkpoint, event time etc.
2. Connected Stream approach. Using a KafkaConsumer in open() doing a
initial load. Have a FlinkKafkaSource Stream connected with events. In this
case have to take care of out of order updates to caches, since the rules
updates are from open() and Rule DataStream.

--
Thanks,
-Vijay
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB