Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> filter before flush to disk

Copy link to this message
filter before flush to disk
Would it be possible to filter the collection before it gets flush to disk?

Say I am tracking page views per user, and I could perform a rollup before
it gets flushed to disk (using a hashmap with the key being the sessionId,
and increment a counter for the duplicate entries).

And could this be done w/o modifying the original source, maybe through
some sort of event/listener?