1

Simple question, but so far very difficult to answer... =-[

I am trying to deploy OpenShift (OKD) 4.5 or 4.7 as directed here Guide: Installing an OKD 4.5 Cluster. Look at the "Starting the control plane nodes" section.

I'm trying to create the cluster using an UPI (User Provisioned Infrastructure)/Bare Metal (KVM).

PROBLEM:

  • Version 4.5

The master node cannot finish installation after reboot. It keeps showing the following error...

[ 1304.254380] ignition[485]: GET https://api-int.mbr.okd.local:22623/config/master: attempt #92
[ 1314.264629] ignition[485]: GET error: Get "https://api-int.mbr.okd.local:22623/config/master": net/http: timeout awaiting response headers

For version 4.5 we use "Fedora CoreOS 32.20200715.3.0".

  • Version 4.7

The master node cannot finish installation after reboot. It keeps showing the following error...

[  543.933709] ignition[505]: GET https://api-int.mbr.okd.local:22623/config/master: attempt #112
[  543.939340] ignition[505]: GET error: Get "https://api-int.mbr.okd.loca1:22623/config/master": EOF

For version 4.7 we use "Fedora CoreOS 34.20210518.3.0".


I've waited hours and hours and the master nodes are still in the same situation. What can I do to resolve this?

Thanks! =D


MORE INFORMATION:

See if this helps...

This output occurs in okd_master_3 (10.3.0.7)....

[ 1304.254380] ignition[485]: GET https://api-int.mbr.okd.local:22623/config/master: attempt #92
[ 1314.264629] ignition[485]: GET error: Get "https://api-int.mbr.okd.local:22623/config/master": net/http: timeout awaiting response headers

Connecting okd_master_2 (10.3.0.6) from okd_services (10.3.0.14)...

NOTE: The okd_master_2 (10.3.0.6) was able to boot (reached login screen).

[root@okd_services ~]# ssh core@10.3.0.6
The authenticity of host '10.3.0.6 (10.3.0.6)' can't be established.
ECDSA key fingerprint is SHA256:1xdq65g0ljnZYR6uXHaXW6EsxO3u6X268s4Z9Kfq0ng.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '10.3.0.6' (ECDSA) to the list of known hosts.
Fedora CoreOS 32.20200629.3.0
Tracker: https://github.com/coreos/fedora-coreos-tracker
Discuss: https://discussion.fedoraproject.org/c/server/coreos/

Pinging the okd_bootstrap (10.3.0.4) from okd_master_2 (10.3.0.6)...

[core@localhost ~]$ ping 10.3.0.4
PING 10.3.0.4 (10.3.0.4) 56(84) bytes of data.
64 bytes from 10.3.0.4: icmp_seq=1 ttl=64 time=0.973 ms
64 bytes from 10.3.0.4: icmp_seq=2 ttl=64 time=0.801 ms
64 bytes from 10.3.0.4: icmp_seq=3 ttl=64 time=0.373 ms
64 bytes from 10.3.0.4: icmp_seq=4 ttl=64 time=0.647 ms
^C
--- 10.3.0.4 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3032ms
rtt min/avg/max/mdev = 0.373/0.698/0.973/0.220 ms

Calling the problematic URL from okd_master_2 (10.3.0.6)...

[core@localhost ~]$ curl -kv https://api-int.mbr.okd.local:22623/config/master
*   Trying 10.3.0.14:22623...
* Connected to api-int.mbr.okd.local (10.3.0.14) port 22623 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=api-int.mbr.okd.local
*  start date: Jun 16 23:52:22 2021 GMT
*  expire date: Jun 14 23:52:23 2031 GMT
*  issuer: OU=openshift; CN=root-ca
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x561ed249aa40)
> GET /config/master HTTP/2
> Host: api-int.mbr.okd.local:22623
> user-agent: curl/7.69.1
> accept: */*
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!
< HTTP/2 500 
< content-length: 0
< date: Thu, 17 Jun 2021 14:55:43 GMT
< 
* Connection #0 to host api-int.mbr.okd.local left intact

INFRASTRUCTURE:

Virtual machines...

NAME           ROLE                   OS              IP          MAC
okd_boostrap   bootstrap              Fedora CoreOS   10.3.0.4    52:54:00:07:80:62
okd_master_1   master                 Fedora CoreOS   10.3.0.5    52:54:00:7d:97:70
okd_master_2   master                 Fedora CoreOS   10.3.0.6    52:54:00:6e:52:85
okd_master_3   master                 Fedora CoreOS   10.3.0.7    52:54:00:a3:65:d9
okd_worker_1   worker                 Fedora CoreOS   10.3.0.8    52:54:00:e3:7c:fb
okd_worker_2   worker                 Fedora CoreOS   10.3.0.9    52:54:00:20:ec:4f
okd_services   DNS/LB/web/NFS         CentOS 8        10.3.0.14   52:54:00:3a:fd:a2
                                                         10.2.0.18   52:54:00:92:ce:78
okd_pfsense    firewall/router/DHCP   FreeBSD         10.3.0.2 52:54:00:d8:27:82
                                                         10.2.0.19   52:54:00:ac:82:7d

 . OKD_LAN: "10.3.0";
 . EXT_LAN: "10.2.0".

Some acronyms...
 _ DNS - Domain Name System;
 _ LB - Load Balancing;
 _ Web - Web Server;
 _ NFS - Network File Sharing.

Network layout...

           ...→.[N]WAN/EXT_LAN([R]dhcp).←... (10.2.0.0/24)
           ↓                               ↓
          [I]WAN/EXT_LAN                  [I]WAN/EXT_LAN
  [V]OKD_PFSENSE([R]dhcp)                 [V]OKD_SERVICES
          [I]OKD_LAN                      [I]OKD_LAN
           ↑                               ↑
           .........→.[N]OKD_LAN.←.......... (10.3.0.0/24)
                       ↑
      ...................................
      ↓                ↓                ↓
     [V]OKD_BOOSTRAP  [V]OKD_MASTER_1  [V]OKD_WORKER_1
                      [V]OKD_MASTER_2  [V]OKD_WORKER_2
                      [V]OKD_MASTER_3

 _ [N] - Network;
 _ [R] - Network Resource;
 _ [I] - Network Interface;
 _ [V] - Virtual Machine.

CONFIGURATION FILES:

BIND 9 (DNS):

. db.10.3.0

$TTL    604800
@   IN  SOA okd-services.okd.local. admin.okd.local. (
        6       ; Serial
        604800  ; Refresh
        86400   ; Retry
        2419200 ; Expire
        604800  ; Negative Cache TTL
)

; Name servers - "NS" records.
    IN  NS  okd-services.okd.local.

; Name servers - "PTR" records.
14 IN  PTR okd-services.okd.local.

; OpenShift container platform cluster - "PTR" records.
4 IN  PTR okd-boostrap.mbr.okd.local.
5 IN  PTR okd-master-1.mbr.okd.local.
6 IN  PTR okd-master-2.mbr.okd.local.
7 IN  PTR okd-master-3.mbr.okd.local.
8 IN  PTR okd-worker-1.mbr.okd.local.
9 IN  PTR okd-worker-2.mbr.okd.local.
14 IN  PTR api.mbr.okd.local.
14 IN  PTR api-int.mbr.okd.local.

. db.okd.local

$TTL    604800
@   IN  SOA okd-services.okd.local. admin.okd.local. (
        1       ; Serial
        604800  ; Refresh
        86400   ; Retry
        2419200 ; Expire
        604800  ; Negative Cache TTL
)

; Name servers - "NS" records.
    IN  NS  okd-services

; Name servers - "A" records.
okd-services.okd.local. IN A 10.3.0.14

; OpenShift container platform cluster - "A" records.
okd-boostrap.mbr.okd.local. IN  A   10.3.0.4
okd-master-1.mbr.okd.local. IN  A   10.3.0.5
okd-master-2.mbr.okd.local. IN  A   10.3.0.6
okd-master-3.mbr.okd.local. IN  A   10.3.0.7
okd-worker-1.mbr.okd.local. IN  A   10.3.0.8
okd-worker-2.mbr.okd.local. IN  A   10.3.0.9

; Openshift internal cluster IPs - "A" records.
api.mbr.okd.local.              IN  A   10.3.0.14
api-int.mbr.okd.local.          IN  A   10.3.0.14
*.apps.mbr.okd.local.           IN  A   10.3.0.14
etcd-0.mbr.okd.local.           IN  A   10.3.0.5
etcd-1.mbr.okd.local.           IN  A   10.3.0.6
etcd-2.mbr.okd.local.           IN  A   10.3.0.7
cons-okd.apps.mbr.okd.local.    IN  A   10.3.0.14
oauth-okd.apps.mbr.okd.local.   IN  A   10.3.0.14

; OpenShift internal cluster IPs - "SRV" records.
_etcd-server-ssl._tcp.mbr.okd.local.    86400   IN  SRV 0   10  2380    etcd-0.mbr
_etcd-server-ssl._tcp.mbr.okd.local.    86400   IN  SRV 0   10  2380    etcd-1.mbr
_etcd-server-ssl._tcp.mbr.okd.local.    86400   IN  SRV 0   10  2380    etcd-2.mbr

. named.conf.local

zone "okd.local" {
    type master;
    file "/etc/named/zones/db.okd.local"; // Zone file path.
};

zone "0.3.10.in-addr.arpa" {
    type master;
    file "/etc/named/zones/db.10.3.0"; // 10.3.0.0/24 subnet.
};

. named.conf

//
// named.conf
//
// Provided by Red Hat bind package to configure the ISC BIND named(8) DNS server
// as a caching only nameserver (as a localhost DNS resolver only).
//
// See /usr/share/doc/bind*/sample/ for example named configuration files.
//
// See the BIND Administrator's Reference Manual (ARM) for details about the configuration
// located in /usr/share/doc/bind-{version}/Bv9ARM.html .

options {
    listen-on port 53 { 127.0.0.1; 10.3.0.14; };
    directory "/var/named";
    dump-file "/var/named/data/cache_dump.db";
    statistics-file "/var/named/data/named_stats.txt";
    memstatistics-file "/var/named/data/named_mem_stats.txt";
    recursing-file "/var/named/data/named.recursing";
    secroots-file "/var/named/data/named.secroots";
    allow-query { localhost; 10.3.0.0/24; };

    // - If you are building an AUTHORITATIVE DNS server, do NOT enable recursion.
    // - If you are building a RECURSIVE (caching) DNS server, you need to enable
    // recursion.
    // - If your recursive DNS server has a public IP address, you MUST enable access
    // control to limit queries to your legitimate users. Failing to do so will cause
    // your server to become part of large scale DNS amplification attacks. Implementing
    // BCP38 within your network would greatly reduce such attack surface.
    recursion yes;

    forwarders {
        8.8.8.8;
        8.8.4.4;
    };

    dnssec-enable yes;
    dnssec-validation yes;

    // Path to ISC DLV key.
    bindkeys-file "/etc/named.root.key";

    managed-keys-directory "/var/named/dynamic";

    pid-file "/run/named/named.pid";
    session-keyfile "/run/named/session.key";
};

logging {
    channel default_debug {
        file "data/named.run";
        severity dynamic;
    };
};

zone "." IN {
    type hint;
    file "named.ca";
};

include "/etc/named.rfc1912.zones";
include "/etc/named.root.key";
include "/etc/named/named.conf.local";

HAProxy (load balancer):

. haproxy.cfg

#---------------------------------------------------------------------
# Global settings.
#---------------------------------------------------------------------
global
    maxconn 20000
    log /dev/log local0 info
    chroot /var/lib/haproxy
    pidfile /var/run/haproxy.pid
    user haproxy
    group haproxy
    daemon

    # Turn on stats unix socket.
    stats socket /var/lib/haproxy/stats

#---------------------------------------------------------------------
# Common defaults that all the "listen" and "backend" sections will use if not designated
# in their block.
#---------------------------------------------------------------------
defaults
    mode http
    log global
    option httplog
    option dontlognull
    option http-server-close
    option redispatch
    retries 3
    timeout http-request 10s
    timeout queue 1m
    timeout connect 10s
    timeout client 300s
    timeout server 300s
    timeout http-keep-alive 10s
    timeout check 10s
    maxconn 20000

listen stats
    bind :9000
    mode http
    option forwardfor except 127.0.0.0/8
    stats enable
    stats uri /

frontend okd_k8s_api_fe
    bind :6443
    default_backend okd_k8s_api_be
    mode tcp
    option tcplog

backend okd_k8s_api_be
    balance source
    mode tcp
    server okd-boostrap 10.3.0.4:6443 check
    server okd-master-1 10.3.0.5:6443 check
    server okd-master-2 10.3.0.6:6443 check
    server okd-master-3 10.3.0.7:6443 check

frontend okd_machine_config_server_fe
    bind :22623
    default_backend okd_machine_config_server_be
    mode tcp
    option tcplog

backend okd_machine_config_server_be
    balance source
    mode tcp
    server okd-boostrap 10.3.0.4:22623 check
    server okd-master-1 10.3.0.5:22623 check
    server okd-master-2 10.3.0.6:22623 check
    server okd-master-3 10.3.0.7:22623 check

frontend okd_http_ingress_traffic_fe
    bind :80
    default_backend okd_http_ingress_traffic_be
    mode tcp
    option tcplog

backend okd_http_ingress_traffic_be
    balance source
    mode tcp
    server okd-worker-1 10.3.0.8:80 check
    server okd-worker-2 10.3.0.9:80 check

frontend okd_https_ingress_traffic_fe
    bind *:443
    default_backend okd_https_ingress_traffic_be
    mode tcp
    option tcplog

backend okd_https_ingress_traffic_be
    balance source
    mode tcp
    server okd-worker-1 10.3.0.8:443 check
    server okd-worker-2 10.3.0.9:443 check

OpenShift (OKD) "*.yaml" files:

. htpasswd_provider.yaml

apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
  name: cluster
spec:
  identityProviders:
  - name: htpasswd_provider
    mappingMethod: claim
    type: HTPasswd
    htpasswd:
      fileData:
        name: htpass-secret

. install-config.yaml

apiVersion: v1
baseDomain: okd.local
metadata:
  name: mbr

compute:
- hyperthreading: Enabled
  name: worker
  replicas: 0

controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 3

networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16

platform:
  none: {}

fips: false

pullSecret: '{"auths":{"fake":{"auth": "bar"}}}'
sshKey: 'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAA<SKIPPED>QbAKPwwhdCkTpd8= root@okd_services.my_domain.com.br'

. registry_pv.yaml

apiVersion: v1
kind: PersistentVolume
metadata:
  name: registry-pv
spec:
  capacity:
    storage: 45Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  nfs:
    path: /var/nfsshare/registry
    server: 10.3.0.14

UPDATE:

. netstat -natup output...

[root@okd_services ~]# netstat -natup
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      906/sshd            
tcp        0      0 127.0.0.1:953           0.0.0.0:*               LISTEN      929/named           
tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      4572/haproxy        
tcp        0      0 0.0.0.0:22623           0.0.0.0:*               LISTEN      4572/haproxy        
tcp        0      0 0.0.0.0:9000            0.0.0.0:*               LISTEN      4572/haproxy        
tcp        0      0 0.0.0.0:6443            0.0.0.0:*               LISTEN      4572/haproxy        
tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN      1/systemd           
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      4572/haproxy        
tcp        0      0 192.168.122.1:53        0.0.0.0:*               LISTEN      1742/dnsmasq        
tcp        0      0 10.3.0.14:53            0.0.0.0:*               LISTEN      929/named           
tcp        0      0 127.0.0.1:53            0.0.0.0:*               LISTEN      929/named           
tcp        0      0 10.2.0.18:22            10.2.0.3:44536          ESTABLISHED 1854/sshd: root [pr 
tcp        0      0 10.3.0.14:52252         10.3.0.4:6443           ESTABLISHED 4572/haproxy        
tcp        0      0 10.3.0.14:52134         10.3.0.4:6443           ESTABLISHED 4572/haproxy        
tcp        0      1 10.3.0.14:42222         10.3.0.8:443            SYN_SENT    4572/haproxy        
tcp        0      0 10.3.0.14:6443          10.3.0.6:51962          ESTABLISHED 4572/haproxy        
tcp        0      0 10.3.0.14:52130         10.3.0.4:6443           ESTABLISHED 4572/haproxy        
tcp        0      0 10.3.0.14:6443          10.3.0.6:51946          ESTABLISHED 4572/haproxy        
tcp        0      1 10.3.0.14:40530         10.3.0.9:443            SYN_SENT    4572/haproxy        
tcp        0    196 10.2.0.18:22            10.2.0.3:44538          ESTABLISHED 5000/sshd: root [pr 
tcp        0      0 10.2.0.18:45472         10.2.0.5:389            ESTABLISHED 878/sssd_be         
tcp        0      0 10.3.0.14:51970         10.3.0.4:6443           ESTABLISHED 4572/haproxy        
tcp        0      0 10.3.0.14:54056         10.3.0.4:6443           ESTABLISHED 4572/haproxy        
tcp        0      0 10.2.0.18:33328         147.75.69.225:80        TIME_WAIT   -                   
tcp        0      0 10.3.0.14:6443          10.3.0.5:39976          ESTABLISHED 4572/haproxy        
tcp        0      0 10.3.0.14:6443          10.3.0.5:52462          ESTABLISHED 4572/haproxy        
tcp        0      1 10.3.0.14:41396         10.3.0.7:22623          SYN_SENT    4572/haproxy        
tcp        0      1 10.3.0.14:41964         10.3.0.9:80             SYN_SENT    4572/haproxy        
tcp        0      1 10.3.0.14:60674         10.3.0.7:6443           SYN_SENT    4572/haproxy        
tcp        0      0 10.3.0.14:6443          10.3.0.5:40024          ESTABLISHED 4572/haproxy        
tcp        0      0 10.2.0.18:43394         109.205.222.4:80        TIME_WAIT   -                   
tcp6       0      0 :::22                   :::*                    LISTEN      906/sshd            
tcp6       0      0 ::1:953                 :::*                    LISTEN      929/named           
tcp6       0      0 :::111                  :::*                    LISTEN      1/systemd           
tcp6       0      0 :::8080                 :::*                    LISTEN      1131/httpd          
tcp6       0      0 :::53                   :::*                    LISTEN      929/named           
udp        0      0 192.168.122.1:53        0.0.0.0:*                           1742/dnsmasq        
udp        0      0 10.3.0.14:53            0.0.0.0:*                           929/named           
udp        0      0 127.0.0.1:53            0.0.0.0:*                           929/named           
udp        0      0 0.0.0.0:67              0.0.0.0:*                           1742/dnsmasq        
udp        0      0 10.3.0.14:68            10.3.0.2:67             ESTABLISHED 893/NetworkManager  
udp        0      0 10.2.0.18:68            10.2.0.2:67             ESTABLISHED 893/NetworkManager  
udp        0      0 0.0.0.0:111             0.0.0.0:*                           1/systemd           
udp        0      0 127.0.0.1:323           0.0.0.0:*                           857/chronyd         
udp6       0      0 :::53                   :::*                                929/named           
udp6       0      0 :::111                  :::*                                1/systemd           
udp6       0      0 ::1:323                 :::*                                857/chronyd

Thanks! =D

Eduardo Lucio
  • 253
  • 3
  • 13

0 Answers0