Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> random access

Copy link to this message
Re: random access
So I'll just have to create one then I guess if I want to do this.  I was
planning on doing this:

prod#1 -> kafka#1 -> consumer  -> prod#2 -> kafka#2 central

kafka-central will have long lasting messages.

So in the consumer that pulls off the kafka#2 will filter messages, and
then I can create an index that maps offset to messageId.

Just wondering how fast random access to a kafka fill will be, like will it
be as fast as a db lookup.  it's a memory mapped file so it should be fast
in theory but when the # of files grows things will degrade.

On Wed, Jun 13, 2012 at 10:01 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:

> There is no scanning, we compute the message location from the offset and
> begin fetching there.
> Sent from my iPhone
> On Jun 13, 2012, at 6:40 AM, S Ahmed <[EMAIL PROTECTED]> wrote:
> > I was thinking of replicating messages to a central location, and having
> a
> > very long expire date on the messages (like say 1 year).
> >
> > My requirement would be able to not just stream messages, but access
> > messages by key, similiar to a "SELECT * FROM TABLE WHERE id=123"
> >
> > From I understand, currently their is no index file that maps messages to
> > their exact location in a file correct?  i.e. kafka streams the messages,
> > so it goes to a .kafka file, starts from the beginning and streams the
> data
> > to a consumer.  If your offset happends to be in the middle of the file,
> it
> > will scan the file, start at the beginning of the message, figure out the
> > length of the message, and then jump to the position of the next message
> > until it finds the correct message offset, is this correct?
> >
> > i.e. I would have to create some sort of index that maps the offset to
> the
> > 'messageId' (where the messageId is stored in the body of the message
> > itself).