4

I read the documentation of apache kafka but I couldn't find an example about how many partitions should I use in any scenario.

For example lets say that I have 5000 msgs/entries per minute, for this situation how many partitions should I have (or you recommend)?

or is there any way to calculate this? maybe there's a table of values where I can refer to?

Diego Velez
  • 780
  • 1
  • 6
  • 13

1 Answers1

2

There is no good default number of partitions, and you should provide more information.

It depends on the size of messages, your platform and the usage pattern. Can a server store all messages with the retention set? if not you should split the data with several partitions for instances. Same case if you need better throughput or if you need to process messages sequentially or data can be consumed with no particular constraint on the order. There is also a matter of the latency you expect for a message to be consumed. If your messages matter, you'll have to add replicas for each partition and ack all messages on all replica, so it'll slow down the throughput.

You need also to specify if the number you gave is about messages produced or consumed.

5000 messages per minute is very low considering Kafka is build to be fast to process messages. I reached easily 10000 messages / second injected per server with 1kb size.

5000 messages per minute make 84 messages per second, so if one instance of your consumer application can handle this amount you're good, else your consider adding partitions and run several consumer application in parallel, one of each will be responsible of a partition.

Confluent Inc has published a blog post about how to choose number of partitions (and the number of replicas also).

  • Hi baptistemm. First of all thank you very much for the article, it is very helpful. and if you don't mind me to ask, how is your setup and how did you achieve to manage all those messages/sec? (I mean how many partitions, brokers do you have? – Diego Velez Aug 05 '16 at 16:51
  • I corrected my answer because it was 10000 messages not 100000. I had 5 virtual servers, each with 8 cores and 16GB in the cluster. As a production engineer I focused my test on reliability, not on performance, this is developers' job. I created some topics with 5 partitions and 3 replicas, and tested injection with an message simulator created by our developers, and with kafka-performance-producer.sh tool (provided by default). So nothing special, and I didn't tune kafka parameters. – Baptiste Mille-Mathias Aug 05 '16 at 18:14
  • thats impressive, thank you very much for your help. one last question, you said that each message was 1kb in size? or the entire batch of 10000msgs were 1kb? – Diego Velez Aug 05 '16 at 19:06
  • 1
    each message was 1kb. You should read this page https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines if you didn't already – Baptiste Mille-Mathias Aug 05 '16 at 19:07
  • Awesome, thank you very much you were very helpful – Diego Velez Aug 05 '16 at 19:10