Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Our scenario and couple of questions


Copy link to this message
-
Re: Our scenario and couple of questions
Great, thanks a lot!

On 16 October 2012 18:45, Neha Narkhede <[EMAIL PROTECTED]> wrote:

> >> *Question 1*: If each broker has one topic and one partition, if i want
> to
> implement a partitioned producer (in php), I still have 8 partitions in
> total, correct ?
>
> Correct
>
> >> *Question 2*: In future I may have mutliple event tracking clusters
> which I
> want mirrored onto a single topic in the central trucker, is this kind of
> mirroring possible with 0.7.x ?
>
> This is available in 0.7.1 onwards
>
> >> *Question 3*: If i want the low-level php producer to batch & zip 10
> messages like the async scala/java producer does, all i have to do is to
> send a message that is a message set containing all the 10 messages,
> correct ?
>
> Yes, provided you conform with the format of a compressed message -
> https://cwiki.apache.org/confluence/display/KAFKA/Compression
>
> >> *Question 4*: This system is quite likely to go into production in next
> weeks, and I prefer staying with 0.7.x because it's simpler for non-java
> clients but would you advice me to build on 0.8.x and why ?
>
> Recommend staying on 0.7.x since it is stable. If your requirements
> include message replication, durability and guaranteed delivery,
> you might want to wait until 0.8 is released. The wire protocol has
> changed considerably in 0.8.
>
> Thanks,
> Neha
>
> On Tue, Oct 16, 2012 at 10:34 AM, Michal Haris
> <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > Hi everyone*,
> >
> > Our current situtation (without kafka)*
> >
> > - we have at the moment 8 event tracker servers that in total are capable
> > of handling 8000 http events / second but a normal day peak throughput is
> > about 1250 messages / second.
> > - messages are basically http events enriched by various apache mods and
> > trasnformations eventually written into log files
> > - each event is cca 0.5kb when packed as json
> > - these message logs are compressed and every 5 minutes shipped into S3
> > where they are used by hive and other hadoop jobs
> > - pretty standard
> > *
> > My plan is to introduce a kafka system on top the existing offline
> > log-processing. *
> >
> > I have a simulated event stream and have written a hadoop job similar to
> > the etl consumer in the trunk except i keep the offsets in the zookeeper
> > and the output files are partitioned by date directory.
> > In the first phase I am going to install kafka broker on each of the 8
> > tracker servers and simply tail | php producer.php on each of the 8
> tracker
> > servers and then have a PHP code publishing into a local broker node
> under
> > a single topic, so in total there will be a cluster of 8 kafka server
> with
> > a 3 or 5 zookeeper ensemble interlaced on the same hardware. This topic
> is
> > going to be mirrored into a central kafka cluster where the hadoop-loader
> > job will run every 30 min or so.
> >
> > *Question 1*: If each broker has one topic and one partition, if i want
> to
> > implement a partitioned producer (in php), I still have 8 partitions in
> > total, correct ?
> > *Question 2*: In future I may have mutliple event tracking clusters
> which I
> > want mirrored onto a single topic in the central trucker, is this kind of
> > mirroring possible with 0.7.x ?
> > *Question 3*: If i want the low-level php producer to batch & zip 10
> > messages like the async scala/java producer does, all i have to do is to
> > send a message that is a message set containing all the 10 messages,
> > correct ?
> > *Question 4*: This system is quite likely to go into production in next
> > weeks, and I prefer staying with 0.7.x because it's simpler for non-java
> > clients but would you advice me to build on 0.8.x and why ?
> >
> >
> > Thanks a lot
> > --
> > Michal Haris
> > Software Engineer
> >
> > VisualDNA | 7 Moor Street, London, W1D 5NB
> > www.visualdna.com | t: +44 (0) 207 734 7033
>

--
Michal Haris
Software Engineer

VisualDNA | 7 Moor Street, London, W1D 5NB
www.visualdna.com | t: +44 (0) 207 734 7033
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB