Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> filter before flush to disk


+
S Ahmed 2012-05-15, 13:38
+
Jay Kreps 2012-05-15, 15:24
Copy link to this message
-
Re: filter before flush to disk
What do you mean?

"  I think the direction we are going
is instead to just let you co-locate this processing on the same box.
This gives the isolation of separate processes and the overhead of the
transfer over localhost is pretty minor. "
I see what your saying as it is a specific implemention/use case that
diverts from a general purpose mechanism, that's why I was suggesting maybe
a hook/event based system.

On Tue, May 15, 2012 at 11:24 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:

> Yeah I see where you are going with that. We toyed with this idea, but
> the idea of coupling processing to the log storage raises a lot of
> problems for general purpose usage. I think the direction we are going
> is instead to just let you co-locate this processing on the same box.
> This gives the isolation of separate processes and the overhead of the
> transfer over localhost is pretty minor.
>
> -Jay
>
> On Tue, May 15, 2012 at 6:38 AM, S Ahmed <[EMAIL PROTECTED]> wrote:
> > Would it be possible to filter the collection before it gets flush to
> disk?
> >
> > Say I am tracking page views per user, and I could perform a rollup
> before
> > it gets flushed to disk (using a hashmap with the key being the
> sessionId,
> > and increment a counter for the duplicate entries).
> >
> > And could this be done w/o modifying the original source, maybe through
> > some sort of event/listener?
>
+
S Ahmed 2012-05-15, 15:43
+
S Ahmed 2012-05-17, 13:40
+
Jay Kreps 2012-05-17, 15:02
+
S Ahmed 2012-05-17, 21:32
+
Jay Kreps 2012-05-17, 22:34
+
S Ahmed 2012-05-29, 13:30