Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> filter before flush to disk


Copy link to this message
-
Re: filter before flush to disk
Oh, maybe this isn't possible again since the object is mapped to a file,
and it may already have flushed data at the os level?

On Tue, May 15, 2012 at 11:43 AM, S Ahmed <[EMAIL PROTECTED]> wrote:

> One downside is if my logic was messed up, I don't have a timeframe of
> rolling the logic back (which was one of the benefits of kafka's design
> choice of having messages kept around for x days).
>
>
> On Tue, May 15, 2012 at 11:42 AM, S Ahmed <[EMAIL PROTECTED]> wrote:
>
>> What do you mean?
>>
>> "  I think the direction we are going
>> is instead to just let you co-locate this processing on the same box.
>> This gives the isolation of separate processes and the overhead of the
>> transfer over localhost is pretty minor. "
>>
>>
>> I see what your saying as it is a specific implemention/use case that
>> diverts from a general purpose mechanism, that's why I was suggesting maybe
>> a hook/event based system.
>>
>>
>> On Tue, May 15, 2012 at 11:24 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>>
>>> Yeah I see where you are going with that. We toyed with this idea, but
>>> the idea of coupling processing to the log storage raises a lot of
>>> problems for general purpose usage. I think the direction we are going
>>> is instead to just let you co-locate this processing on the same box.
>>> This gives the isolation of separate processes and the overhead of the
>>> transfer over localhost is pretty minor.
>>>
>>> -Jay
>>>
>>> On Tue, May 15, 2012 at 6:38 AM, S Ahmed <[EMAIL PROTECTED]> wrote:
>>> > Would it be possible to filter the collection before it gets flush to
>>> disk?
>>> >
>>> > Say I am tracking page views per user, and I could perform a rollup
>>> before
>>> > it gets flushed to disk (using a hashmap with the key being the
>>> sessionId,
>>> > and increment a counter for the duplicate entries).
>>> >
>>> > And could this be done w/o modifying the original source, maybe through
>>> > some sort of event/listener?
>>>
>>
>>
>