How to manage storage / persistence overflow in Bitnami Kafka Helm chart?

Question

I deploy Bitanmi Kafka Helm chart on AWS provisioned with Terraform. I find the documentation storage and persistence allocation, very confusing. From what I understood from the documentation; logs are chunks of messages in a topic, when configurable quotas of bytes, messages or time, are exceeded, the logs get flushed to files in storage.

The Helm chart stateful set has these volumes:

          volumeMounts:
            - name: data
              mountPath: {{ .Values.persistence.mountPath }}
            - name: logs
              mountPath: {{ .Values.logPersistence.mountPath }}

logPersistence is enabled in helm chart by me to retain logs, in an attached volume provisioned by the Helm chart unless I provide an alternative (which I don't).

What happens if logPersistence is exhausted? Can I configure a fail safe i.e. configure Kafka to retain everything and delete old log files if quota is exceeded? How do I retrieve logs that are persisted? Can a consumer ask for topic messages from an early offset and cause latency or failure? If so, what are some strategies to recover?

Edit

I have learned through observing the directories in the Kafka pods that

By default the Bitnami Helm chart persists messages to persistence mapped to /bitnami/kafka/data/ directory

By default Kafka server logs are sent to stdout and can be configured to be stored in logPersistence mapped to /opt/bitnami/kafka/logs

Naturally these directories become full and can cause a crash.

I do not yet know if there is a configurable Kafka setting to avoid the crash i.e. clear old messages from storage if it nears being exhausted.

miguelaeh · Answer 1 · 2021-06-01T08:05:32.683

(if not do the messages get delete?)

If you don't provide that flag the logs will be sent to stdout only.

What happens if logPersistence is exhausted?

I am not sure what do you mean by this, do you mean getting the volume out of space? You can use the logPersistence.size and in case it gets out of space you will need to redimension de volume.

Does this happen if a client asks for topic messages from the an early offset and cause latency or failure? If so, what are some strategies to recover?

Could you clarify this question, please?

What is kept in "persistence" /data ? does its usage increment? if it increments, can I configure a fail safe?

Mainly the logs if configured, to check if it fits your needs you can try deploying the docker-compose and checking that directory inside the containers.

What if anything is kept on the root disk? how big should be the root disk? does its usage increment? can I configure a fail safe?

Could you please clarify this too? What do you mean with the root disk?

EDIT:

clarification: if logs are flushed to storage can they be retrieved by a consumer or consumer group?

The logs are stored to avoid losing them when a pod is restarted. To consume those logs you can use the approach of your preference, for example, you could use Logstash.

clarification: the pod uses local node storage for all but these directories: data -> /bitnami/kafka and logs /opt/bitnami/kafka/logs

It is difficult to predict which should be the size of the storage. It depends on the amount of data stored. In the case of the logs, there will be more data stored the longer Kafka runs. It will also probably depend not eh requests done to the Kafka brokers. If the directories /bitnami/data and /opt/bitnami/kafka/logs are not using local storage, I guess you are using any other kind of storage, you should be able to update the capacity if needed.

clarification: if logs are flushed to storage can they be retrieved by a consumer or consumer group? — Rubber Duck, May 31 '21 at 11:24
clarification: the pod uses local node storage for all but these directories: data -> /bitnami/kafka and logs /opt/bitnami/kafka/logs — Rubber Duck, May 31 '21 at 11:35
Hi @RubberDuck, I updated the comment above with the information for your clarifications — miguelaeh, Jun 01 '21 at 08:05
3 standard python consumers consume only messages that are in cache. — Rubber Duck, Jun 01 '21 at 12:09
Hi @RubberDuck,I am not sure what is the question now about those python consumers — miguelaeh, Jun 02 '21 at 08:12

How to manage storage / persistence overflow in Bitnami Kafka Helm chart?

1 Answers1