Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Trade-off between topics and partitions?


Copy link to this message
-
Re: Trade-off between topics and partitions?
Benjamin Black 2013-12-06, 05:33
Deja vu!

IMO, what you are describing is a database problem, even though you are
talking/thinking about it as a queue problem. I'm sure you could construct
something using Kafka (and Samza), but I think you'd have an easier time
with a database. The number of pending messages per user and the average
message size would be critical in selecting exactly which sort of database
to use.

My $0.02.

On Thu, Dec 5, 2013 at 7:47 PM, mission mission <[EMAIL PROTECTED]>wrote:

> Hello,
>
> According to the Kafka FAQ "How do I choose the number of partitions for a
> topic", clusters with more than 10K partitions are not tested. I am looking
> for advice on how to scale the number of partitions beyond that. My use
> case is to publish messages to 1 million users, each with an unique user
> id. Users are not always connected but a user must receive published
> messages in order.
>
> What is the best way to divide topics and partitions for this use case? Do
> I need 1 million partitions? The FAQ seems to think so, i.e. "if we were
> storing notifications for users we would encourage a design with a single
> notifications topic partitioned by user id". But the FAQ implies strongly
> that 1 million partitions may wreak havoc on zookeeper because they will
> lead to X million znodes that have to be stored in memory. Any suggestions?
>
> Thanks,
>
> mission
>