Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - since flushes are batched, is it still io intensive?


Copy link to this message
-
Re: since flushes are batched, is it still io intensive?
S Ahmed 2012-05-12, 01:21
then why have an operations page? j/k

thanks Jay!

Just a note, and I hope nobody takes it the wrong way, but I was lookign at
the flume project and I really appreciated how much comments they had in
their code.
Scala is already a bit cryptic, comments would go a long way for newbies :)

On Fri, May 11, 2012 at 7:10 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:

> No Linux does the right thing by default. We have an operations page
> on the site that gives all the details on our setup but there is
> nothing setup.
>
> -Jay
>
> On Fri, May 11, 2012 at 9:16 AM, S Ahmed <[EMAIL PROTECTED]> wrote:
> > Do you tune the o/s dedicated memory for page cache?  Or that's all
> > automatic....
> >
> > It would be cool if linkedin posted some of their server level tweaks if
> > that is critical to getting the most out of zero copy and kafka in
> general
> > :)
> >
> > On Fri, May 11, 2012 at 12:10 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
> >
> >> Memory required for JVM is also low (2-4GB heap size). Most of the
> memory
> >> is used for pagecache.
> >>
> >> Jun
> >>
> >> On Fri, May 11, 2012 at 8:03 AM, S Ahmed <[EMAIL PROTECTED]> wrote:
> >>
> >> > What about memory?  I know you guys have 24GB of ram per server?
> >> >
> >> > Basically I'm juggling between going with a dedicated box (which has
> >> faster
> >> > IO), or ec2 which has slower IO but cheaper on the ram side (way
> >> cheaper!).
> >> >
> >> > On Fri, May 11, 2012 at 10:34 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> >> >
> >> > > It all depends on the volume of the data. At LinkedIn, we observed
> that
> >> > the
> >> > > io load on a typical Kafka broker is not high.
> >> > >
> >> > > Jun
> >> > >
> >> > > On Fri, May 11, 2012 at 7:13 AM, S Ahmed <[EMAIL PROTECTED]>
> wrote:
> >> > >
> >> > > > I was thinking (and after doing some tests on dedicated and ec2),
> >> would
> >> > > you
> >> > > > still say kafka is io intensive?
> >> > > >
> >> > > > Considering writes are batched every x seconds, and you have a
> single
> >> > > kafka
> >> > > > server on a given instance, and consumers are just streaming the
> data
> >> > in
> >> > > > sequential order (the disk head isn't jumping around), is it safe
> to
> >> > say
> >> > > > kafka isn't that io intensive to the point that running it on ec2
> >> > should
> >> > > be
> >> > > > just as good as dedicated hardware?
> >> > > >
> >> > > > I was getting pretty good results on ec2 so this thought came to
> >> me...
> >> > > >
> >> > >
> >> >
> >>
>