Kafka log retention not wokring as expected

Question

Note: I am aware that there are many similar questions, already looked at them, didn't help.

I am running a kafka benchmark test in a closed environment, I gave each broker a filesystem of 40GB for log persistence, and I'm quickly getting to a situation in which it crashes and does not get back up, because that file-system is being filled.

So, in order to avoid such a catastrophic failure in production, I tried setting log.retention.bytes=10737418240 (10GB) and tested it to see if kafka deletes logs before it reaches a situation in which it crashes

as shown in the graph, kafka didn't delete anything after passing 10GB (in other tests, it also got all the way to 40 and crashed again)

Here is my entire server.properties file:

# server settings
controlled.shutdown.enable=true
log.retention.bytes=10737418240
log.cleanup.policy=delete
log.segment.delete.delay.ms=10000
log.retention.check.interval.ms=1000
listeners=PLAINTEXT://:9092

zookeeper.connect=kafka-zookeeper:2181
zookeeper.session.timeout.ms=6000

# this must correlate to kafka's volume claim templates
log.dirs=/var/lib/kafka

# default settings for topics
auto.create.topics.enable=true
delete.topic.enable=true
num.partitions=10
offsets.topic.replication.factor=2
transaction.state.log.replication.factor=2
transaction.state.log.min.isr=2
default.replication.factor=2
# 6291456 / 1024 / 1024 = 6Mb
replica.fetch.max.bytes=6291456
# 5242880 / 1024 / 1024 = 5Mb
message.max.bytes=5242880
group.initial.rebalance.delay.ms=3000

What am I missing?

score 0 · Answer 1 · edited Jun 11 '20 at 10:02

Since this limit is enforced at the partition level, multiply it by the number of partitions to compute the topic retention in bytes.

Server Default Property: log.retention.bytes

― says the documentation about retention.bytes.

Provided you're using the default configuration when creating your topics and you create N topics during your benchmark test, and you're running a cluster of M healthy broker nodes, we can roughly estimate the size the partition logs can grow to on a single broker node before the log cleaner starts discarding the old log segments:

log.retention.bytes × num.partitions × N × default.replication.factor / M,

which in your configuration results in a greater size than the one of your file system:

10GB × 10 × N × 2 / M.

Kafka log retention not wokring as expected

1 Answers1