0

I'm trying to install a RabbitMQ cluster through the Bitnami Helm chart (https://github.com/bitnami/charts/tree/master/bitnami/rabbitmq) in an EKS cluster and when I execute the Helm installation I get the following error in the first pod created:

rabbitmq 13:41:15.99
rabbitmq 13:41:15.99 Welcome to the Bitnami rabbitmq container
rabbitmq 13:41:15.99 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-rabbitmq
rabbitmq 13:41:15.99 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-rabbitmq/issues
rabbitmq 13:41:15.99
rabbitmq 13:41:15.99 INFO  ==> ** Starting RabbitMQ setup **
rabbitmq 13:41:16.01 INFO  ==> Validating settings in RABBITMQ_* env vars..
rabbitmq 13:41:16.03 INFO  ==> Initializing RabbitMQ...
rabbitmq 13:41:16.03 DEBUG ==> Creating environment file...
rabbitmq 13:41:16.03 DEBUG ==> Creating enabled_plugins file...
rabbitmq 13:41:16.04 DEBUG ==> Creating Erlang cookie...
rabbitmq 13:41:16.04 DEBUG ==> Ensuring expected directories/files exist...
rabbitmq 13:41:16.05 INFO  ==> Starting RabbitMQ in background...
Waiting for erlang distribution on node 'rabbit@rabbitmq-0.rabbitmq-headless.tdc.svc.cluster.local' while OS process '51' is running
2022-04-19 13:41:19.198340+00:00 [info] <0.222.0> Feature flags: list of feature flags found:
2022-04-19 13:41:19.212884+00:00 [info] <0.222.0> Feature flags:   [ ] implicit_default_bindings
2022-04-19 13:41:19.212941+00:00 [info] <0.222.0> Feature flags:   [ ] maintenance_mode_status
2022-04-19 13:41:19.212965+00:00 [info] <0.222.0> Feature flags:   [ ] quorum_queue
2022-04-19 13:41:19.212985+00:00 [info] <0.222.0> Feature flags:   [ ] stream_queue
2022-04-19 13:41:19.213077+00:00 [info] <0.222.0> Feature flags:   [ ] user_limits
2022-04-19 13:41:19.213104+00:00 [info] <0.222.0> Feature flags:   [ ] virtual_host_metadata
2022-04-19 13:41:19.213124+00:00 [info] <0.222.0> Feature flags: feature flag states written to disk: yes
2022-04-19 13:41:19.637051+00:00 [noti] <0.44.0> Application syslog exited with reason: stopped
2022-04-19 13:41:19.637148+00:00 [noti] <0.222.0> Logging: switching to configured handler(s); following messages may not be visible in this log output
2022-04-19 13:41:19.656264+00:00 [noti] <0.222.0> Logging: configured log handlers are now ACTIVE
2022-04-19 13:41:19.904087+00:00 [info] <0.222.0> ra: starting system quorum_queues
2022-04-19 13:41:19.904200+00:00 [info] <0.222.0> starting Ra system: quorum_queues in directory: /bitnami/rabbitmq/mnesia/rabbit@rabbitmq-0/quorum/rabbit@rabbitmq-0
2022-04-19 13:41:19.995094+00:00 [info] <0.263.0> ra: meta data store initialised for system quorum_queues. 0 record(s) recovered
2022-04-19 13:41:20.013384+00:00 [noti] <0.268.0> WAL: ra_log_wal init, open tbls: ra_log_open_mem_tables, closed tbls: ra_log_closed_mem_tables
2022-04-19 13:41:20.022921+00:00 [info] <0.222.0> ra: starting system coordination
2022-04-19 13:41:20.022987+00:00 [info] <0.222.0> starting Ra system: coordination in directory: /bitnami/rabbitmq/mnesia/rabbit@rabbitmq-0/coordination/rabbit@rabbitmq-0
2022-04-19 13:41:20.026371+00:00 [info] <0.276.0> ra: meta data store initialised for system coordination. 0 record(s) recovered
2022-04-19 13:41:20.026628+00:00 [noti] <0.281.0> WAL: ra_coordination_log_wal init, open tbls: ra_coordination_log_open_mem_tables, closed tbls: ra_coordination_log_closed_mem_tables
2022-04-19 13:41:20.032159+00:00 [info] <0.222.0>
2022-04-19 13:41:20.032159+00:00 [info] <0.222.0>  Starting RabbitMQ 3.9.8 on Erlang 24.1.2 [jit]
2022-04-19 13:41:20.032159+00:00 [info] <0.222.0>  Copyright (c) 2007-2021 VMware, Inc. or its affiliates.
2022-04-19 13:41:20.032159+00:00 [info] <0.222.0>  Licensed under the MPL 2.0. Website: https://rabbitmq.com

  ##  ##      RabbitMQ 3.9.8
  ##  ##
  ##########  Copyright (c) 2007-2021 VMware, Inc. or its affiliates.
  ######  ##
  ##########  Licensed under the MPL 2.0. Website: https://rabbitmq.com

  Erlang:      24.1.2 [jit]
  TLS Library: OpenSSL - OpenSSL 1.1.1d  10 Sep 2019

  Doc guides:  https://rabbitmq.com/documentation.html
  Support:     https://rabbitmq.com/contact.html
  Tutorials:   https://rabbitmq.com/getstarted.html
  Monitoring:  https://rabbitmq.com/monitoring.html

  Logs: /opt/bitnami/rabbitmq/var/log/rabbitmq/rabbit@rabbitmq-0_upgrade.log
        <stdout>

  Config file(s): /opt/bitnami/rabbitmq/etc/rabbitmq/rabbitmq.conf

  Starting broker...2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>  node           : rabbit@rabbitmq-0
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>  home dir       : /opt/bitnami/rabbitmq/.rabbitmq
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>  config file(s) : /opt/bitnami/rabbitmq/etc/rabbitmq/rabbitmq.conf
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>  cookie hash    : d3Nfp8t690Ln1h811Tuxzw==
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>  log(s)         : /opt/bitnami/rabbitmq/var/log/rabbitmq/rabbit@rabbitmq-0_upgrade.log
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>                 : <stdout>
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>  database dir   : /bitnami/rabbitmq/mnesia/rabbit@rabbitmq-0
2022-04-19 13:41:20.307590+00:00 [info] <0.222.0> Feature flags: list of feature flags found:
2022-04-19 13:41:20.307654+00:00 [info] <0.222.0> Feature flags:   [ ] drop_unroutable_metric
2022-04-19 13:41:20.307681+00:00 [info] <0.222.0> Feature flags:   [ ] empty_basic_get_metric
2022-04-19 13:41:20.307705+00:00 [info] <0.222.0> Feature flags:   [ ] implicit_default_bindings
2022-04-19 13:41:20.307792+00:00 [info] <0.222.0> Feature flags:   [ ] maintenance_mode_status
2022-04-19 13:41:20.307818+00:00 [info] <0.222.0> Feature flags:   [ ] quorum_queue
2022-04-19 13:41:20.307838+00:00 [info] <0.222.0> Feature flags:   [ ] stream_queue
2022-04-19 13:41:20.307908+00:00 [info] <0.222.0> Feature flags:   [ ] user_limits
2022-04-19 13:41:20.307947+00:00 [info] <0.222.0> Feature flags:   [ ] virtual_host_metadata
2022-04-19 13:41:20.307968+00:00 [info] <0.222.0> Feature flags: feature flag states written to disk: yes
Error: operation wait on node rabbit@rabbitmq-0.rabbitmq-headless.tdc.svc.cluster.local timed out. Timeout value used: 5000
2022-04-19 13:41:23.299211+00:00 [info] <0.222.0> Running boot step pre_boot defined by app rabbit
2022-04-19 13:41:23.299295+00:00 [info] <0.222.0> Running boot step rabbit_global_counters defined by app rabbit
2022-04-19 13:41:23.299545+00:00 [info] <0.222.0> Running boot step rabbit_osiris_metrics defined by app rabbit
2022-04-19 13:41:23.299746+00:00 [info] <0.222.0> Running boot step rabbit_core_metrics defined by app rabbit
2022-04-19 13:41:23.300299+00:00 [info] <0.222.0> Running boot step rabbit_alarm defined by app rabbit
2022-04-19 13:41:23.304497+00:00 [info] <0.297.0> Memory high watermark set to 12695 MiB (13312088473 bytes) of 31738 MiB (33280221184 bytes) total
2022-04-19 13:41:23.308954+00:00 [info] <0.299.0> Enabling free disk space monitoring
2022-04-19 13:41:23.309007+00:00 [info] <0.299.0> Disk free limit set to 50MB
2022-04-19 13:41:23.312489+00:00 [info] <0.222.0> Running boot step code_server_cache defined by app rabbit
2022-04-19 13:41:23.312650+00:00 [info] <0.222.0> Running boot step file_handle_cache defined by app rabbit
2022-04-19 13:41:23.312958+00:00 [info] <0.302.0> Limiting to approx 65439 file handles (58893 sockets)
2022-04-19 13:41:23.313163+00:00 [info] <0.303.0> FHC read buffering: OFF
2022-04-19 13:41:23.313217+00:00 [info] <0.303.0> FHC write buffering: ON
2022-04-19 13:41:23.313829+00:00 [info] <0.222.0> Running boot step worker_pool defined by app rabbit
2022-04-19 13:41:23.313932+00:00 [info] <0.283.0> Will use 4 processes for default worker pool
2022-04-19 13:41:23.313982+00:00 [info] <0.283.0> Starting worker pool 'worker_pool' with 4 processes in it
2022-04-19 13:41:23.314583+00:00 [info] <0.222.0> Running boot step database defined by app rabbit
2022-04-19 13:41:23.314894+00:00 [info] <0.222.0> Node database directory at /bitnami/rabbitmq/mnesia/rabbit@rabbitmq-0 is empty. Assuming we need to join an existing cluster or initialise from scratch...
2022-04-19 13:41:23.314963+00:00 [info] <0.222.0> Configured peer discovery backend: rabbit_peer_discovery_k8s
2022-04-19 13:41:23.315110+00:00 [info] <0.222.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
2022-04-19 13:41:23.316998+00:00 [noti] <0.44.0> Application mnesia exited with reason: stopped

BOOT FAILED
===========
Exception during startup:

2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0> BOOT FAILED
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0> ===========
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0> Exception during startup:
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0> error:{badmatch,{error,enoent}}
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_peer_discovery_k8s:make_request/0, line 121
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_peer_discovery_k8s:list_nodes/0, line 41
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_peer_discovery_k8s:lock/1, line 76
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_peer_discovery:lock/0, line 190
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_mnesia:init_with_lock/3, line 104
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_mnesia:init/0, line 76
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_boot_steps:-run_step/2-lc$^0/1-0-/2, line 41
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_boot_steps:run_step/2, line 46
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>
error:{badmatch,{error,enoent}}

    rabbit_peer_discovery_k8s:make_request/0, line 121
    rabbit_peer_discovery_k8s:list_nodes/0, line 41
    rabbit_peer_discovery_k8s:lock/1, line 76
    rabbit_peer_discovery:lock/0, line 190
    rabbit_mnesia:init_with_lock/3, line 104
    rabbit_mnesia:init/0, line 76
    rabbit_boot_steps:-run_step/2-lc$^0/1-0-/2, line 41
    rabbit_boot_steps:run_step/2, line 46

2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>   crasher:
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     initial call: application_master:init/4
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     pid: <0.221.0>
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     registered_name: []
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     exception exit: {{badmatch,{error,enoent}},{rabbit,start,[normal,[]]}}
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>       in function  application_master:init/4 (application_master.erl, line 142)
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     ancestors: [<0.220.0>]
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     message_queue_len: 1
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     messages: [{'EXIT',<0.222.0>,normal}]
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     links: [<0.220.0>,<0.44.0>]
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     dictionary: []
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     trap_exit: true
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     status: running
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     heap_size: 2586
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     stack_size: 29
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     reductions: 186
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>   neighbours:
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>
2022-04-19 13:41:24.319087+00:00 [noti] <0.44.0> Application rabbit exited with reason: {{badmatch,{error,enoent}},{rabbit,start,[normal,[]]}}
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{{badmatch,{error,enoent}},{rabbit,start,[normal,[]]}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{{badmatch,{error,enoent}},{rabbit,start,[normal,[]]}}})

Crash dump is being written to: /opt/bitnami/rabbitmq/var/log/rabbitmq/erl_crash.dump...done
Waiting for erlang distribution on node 'rabbit@rabbitmq-0.rabbitmq-headless.tdc.svc.cluster.local' while OS process '51' is running
Error:
process_not_running
Waiting for erlang distribution on node 'rabbit@rabbitmq-0.rabbitmq-headless.tdc.svc.cluster.local' while OS process '51' is running
Error:
process_not_running

It seems like the Erlang cookie is not properly distributed but after checking some posts I've not reached any conclusion.

If you have any kind of info that could be helpful I would be grateful if you share it with me.

EDIT 1: I've gone inside the first and only pod of the three replicas that must be created, run rabbitmq-diagnostics erlang_cookie_sources to find out where are the Erland cookie file stored (/opt/bitnami/rabbitmq/.rabbitmq/.erlang.cookie) and check if it is the same I've indicated in the values.yaml of the chart and it is exactly the same so in the end I think there's no problem distributing the key but I still have the same problem. Checking again the logs I can see there is some process that is not running, I don't know if the problem should be there.

Felipe
  • 11
  • 4

1 Answers1

1

The problem was the service account token that was not distributed to the pods. I've changed the values.yaml of the Helm chart:

serviceAccount:
  ## @param serviceAccount.create Enable creation of ServiceAccount for RabbitMQ pods
  ##
  create: true
  ## @param serviceAccount.name Name of the created serviceAccount
  ## If not set and create is true, a name is generated using the rabbitmq.fullname template
  ##
  #name: ""
  ## @param serviceAccount.automountServiceAccountToken Auto-mount the service account token in the pod
  ##
  automountServiceAccountToken: true
Felipe
  • 11
  • 4