Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Our scenario and couple of questions


Copy link to this message
-
Re: Our scenario and couple of questions
Great, thanks a lot!

On 16 October 2012 18:45, Neha Narkhede <[EMAIL PROTECTED]> wrote:

> >> *Question 1*: If each broker has one topic and one partition, if i want
> to
> implement a partitioned producer (in php), I still have 8 partitions in
> total, correct ?
>
> Correct
>
> >> *Question 2*: In future I may have mutliple event tracking clusters
> which I
> want mirrored onto a single topic in the central trucker, is this kind of
> mirroring possible with 0.7.x ?
>
> This is available in 0.7.1 onwards
>
> >> *Question 3*: If i want the low-level php producer to batch & zip 10
> messages like the async scala/java producer does, all i have to do is to
> send a message that is a message set containing all the 10 messages,
> correct ?
>
> Yes, provided you conform with the format of a compressed message -
> https://cwiki.apache.org/confluence/display/KAFKA/Compression
>
> >> *Question 4*: This system is quite likely to go into production in next
> weeks, and I prefer staying with 0.7.x because it's simpler for non-java
> clients but would you advice me to build on 0.8.x and why ?
>
> Recommend staying on 0.7.x since it is stable. If your requirements
> include message replication, durability and guaranteed delivery,
> you might want to wait until 0.8 is released. The wire protocol has
> changed considerably in 0.8.
>
> Thanks,
> Neha
>
> On Tue, Oct 16, 2012 at 10:34 AM, Michal Haris
> <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > Hi everyone*,
> >
> > Our current situtation (without kafka)*
> >
> > - we have at the moment 8 event tracker servers that in total are capable
> > of handling 8000 http events / second but a normal day peak throughput is
> > about 1250 messages / second.
> > - messages are basically http events enriched by various apache mods and
> > trasnformations eventually written into log files
> > - each event is cca 0.5kb when packed as json
> > - these message logs are compressed and every 5 minutes shipped into S3
> > where they are used by hive and other hadoop jobs
> > - pretty standard
> > *
> > My plan is to introduce a kafka system on top the existing offline
> > log-processing. *
> >
> > I have a simulated event stream and have written a hadoop job similar to
> > the etl consumer in the trunk except i keep the offsets in the zookeeper
> > and the output files are partitioned by date directory.
> > In the first phase I am going to install kafka broker on each of the 8
> > tracker servers and simply tail | php producer.php on each of the 8
> tracker
> > servers and then have a PHP code publishing into a local broker node
> under
> > a single topic, so in total there will be a cluster of 8 kafka server
> with
> > a 3 or 5 zookeeper ensemble interlaced on the same hardware. This topic
> is
> > going to be mirrored into a central kafka cluster where the hadoop-loader
> > job will run every 30 min or so.
> >
> > *Question 1*: If each broker has one topic and one partition, if i want
> to
> > implement a partitioned producer (in php), I still have 8 partitions in
> > total, correct ?
> > *Question 2*: In future I may have mutliple event tracking clusters
> which I
> > want mirrored onto a single topic in the central trucker, is this kind of
> > mirroring possible with 0.7.x ?
> > *Question 3*: If i want the low-level php producer to batch & zip 10
> > messages like the async scala/java producer does, all i have to do is to
> > send a message that is a message set containing all the 10 messages,
> > correct ?
> > *Question 4*: This system is quite likely to go into production in next
> > weeks, and I prefer staying with 0.7.x because it's simpler for non-java
> > clients but would you advice me to build on 0.8.x and why ?
> >
> >
> > Thanks a lot
> > --
> > Michal Haris
> > Software Engineer
> >
> > VisualDNA | 7 Moor Street, London, W1D 5NB
> > www.visualdna.com | t: +44 (0) 207 734 7033
>

--
Michal Haris
Software Engineer

VisualDNA | 7 Moor Street, London, W1D 5NB
www.visualdna.com | t: +44 (0) 207 734 7033