Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> random access


Copy link to this message
-
Re: random access
So I'll just have to create one then I guess if I want to do this.  I was
planning on doing this:

prod#1 -> kafka#1 -> consumer  -> prod#2 -> kafka#2 central

kafka-central will have long lasting messages.

So in the consumer that pulls off the kafka#2 will filter messages, and
then I can create an index that maps offset to messageId.

Just wondering how fast random access to a kafka fill will be, like will it
be as fast as a db lookup.  it's a memory mapped file so it should be fast
in theory but when the # of files grows things will degrade.

On Wed, Jun 13, 2012 at 10:01 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:

> There is no scanning, we compute the message location from the offset and
> begin fetching there.
>
> Sent from my iPhone
>
> On Jun 13, 2012, at 6:40 AM, S Ahmed <[EMAIL PROTECTED]> wrote:
>
> > I was thinking of replicating messages to a central location, and having
> a
> > very long expire date on the messages (like say 1 year).
> >
> > My requirement would be able to not just stream messages, but access
> > messages by key, similiar to a "SELECT * FROM TABLE WHERE id=123"
> >
> > From I understand, currently their is no index file that maps messages to
> > their exact location in a file correct?  i.e. kafka streams the messages,
> > so it goes to a .kafka file, starts from the beginning and streams the
> data
> > to a consumer.  If your offset happends to be in the middle of the file,
> it
> > will scan the file, start at the beginning of the message, figure out the
> > length of the message, and then jump to the position of the next message
> > until it finds the correct message offset, is this correct?
> >
> > i.e. I would have to create some sort of index that maps the offset to
> the
> > 'messageId' (where the messageId is stored in the body of the message
> > itself).
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB