I've working on (and off) a deployment of Openstack over the past few months (nearly a year), and I've come across a number of issues during the deployment, most of which was either bad switch configuration, or a bad configuration on the heat templates.
I've been able to complete a successful deployment of Openstack multiple times with a fresh deployment, however as I was preparing the Overcloud with projects, I was unable to create an instance. From the output of "compute service list":
openstack compute service list
+----+----------------+----------------------+----------+---------+-------+----------------------------+
| ID | Binary | Host | Zone | Status | State | Updated At |
+----+----------------+----------------------+----------+---------+-------+----------------------------+
| 1 | nova-conductor | controller-0.host.cp | internal | enabled | up | 2021-04-20T20:43:03.000000 |
| 2 | nova-scheduler | controller-0.host.cp | internal | enabled | up | 2021-04-20T20:43:01.000000 |
| 12 | nova-compute | compute-0.host.cp | nova | enabled | down | 2021-04-20T09:47:52.000000 |
+----+----------------+----------------------+----------+---------+-------+----------------------------+
I've also noticed that I attempted a scale out with one additional node, but it's not present in the list above, or in the "hypervisor list", but it is visible from a "server list" from the undercloud node:
openstack server list
+--------------------------------------+--------------+--------+-----------------------+----------------+-----------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+--------------+--------+-----------------------+----------------+-----------+
| 5cb29129-7ce8-439a-b00b-3868d5a9aa74 | compute-1 | ACTIVE | ctlplane=10.128.0.136 | overcloud-full | baremetal |
| 58c3d587-d2a8-4601-87a7-3fd3d32a78b6 | controller-0 | ACTIVE | ctlplane=10.128.0.5 | overcloud-full | baremetal |
| 288dde8f-5664-42b2-b9f4-333992964dde | compute-0 | ACTIVE | ctlplane=10.128.0.75 | overcloud-full | baremetal |
+--------------------------------------+--------------+--------+-----------------------+----------------+-----------+
I've carried out 2 fresh installs, and I'm now faced with the following issue for all compute services that are intended to connect to the Controller node:
2021-04-23 22:28:37.891 7 ERROR nova keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to http://10.127.2.8:5000/v3/auth/tokens: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))
A manual curl from the compute node to the keystone endpoint yields the following (expected) output:
curl http://10.127.2.8:5000/v3/auth/tokens
{"error":{"code":401,"message":"The request you have made requires authentication.","title":"Unauthorized"}}
I don't believe that this is something in the network stack that's causing this issue, and is instead something else. I'd appreciate any assistance with this.
Deployment Information: Controller Nodes = 1 Compute nodes = 2 deployed, 4 introspected OS = CentOS Steam 8 (both undercloud and overcloud) Networking:
- 4 Interfaces: 1 primary, 2 port bond (OVS + LACP), 1 storage port
- 2 Juniper EX3400's clustered (LACP configured on bonded ports)
Let me know if any further information is required.
EDIT:
Here is a TCP dump from both Compute and Controller, outlining the transaction of the call to keystone: https://pastebin.com/ADT4RCun