So I'll just have to create one then I guess if I want to do this. I was
planning on doing this:
prod#1 -> kafka#1 -> consumer -> prod#2 -> kafka#2 central
kafka-central will have long lasting messages.
So in the consumer that pulls off the kafka#2 will filter messages, and
then I can create an index that maps offset to messageId.
Just wondering how fast random access to a kafka fill will be, like will it
be as fast as a db lookup. it's a memory mapped file so it should be fast
in theory but when the # of files grows things will degrade.
On Wed, Jun 13, 2012 at 10:01 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> There is no scanning, we compute the message location from the offset and
> begin fetching there.
> Sent from my iPhone
> On Jun 13, 2012, at 6:40 AM, S Ahmed <[EMAIL PROTECTED]> wrote:
> > I was thinking of replicating messages to a central location, and having
> > very long expire date on the messages (like say 1 year).
> > My requirement would be able to not just stream messages, but access
> > messages by key, similiar to a "SELECT * FROM TABLE WHERE id=123"
> > From I understand, currently their is no index file that maps messages to
> > their exact location in a file correct? i.e. kafka streams the messages,
> > so it goes to a .kafka file, starts from the beginning and streams the
> > to a consumer. If your offset happends to be in the middle of the file,
> > will scan the file, start at the beginning of the message, figure out the
> > length of the message, and then jump to the position of the next message
> > until it finds the correct message offset, is this correct?
> > i.e. I would have to create some sort of index that maps the offset to
> > 'messageId' (where the messageId is stored in the body of the message
> > itself).