1

I have two coreos stable v1122.2.0 machines, each one with etcd2 configured with tls.

I created the certificates using https://github.com/coreos/etcd/tree/master/hack/tls-setup.

now I'm trying to configure calico-node to operate on my coreos master node with rkt.

I have the following in cloud-config configuration:

write_files:
 - path: "/etc/kubernetes/cni/net.d/10-calico.conf"
   content: |
     {
     "name": "calico",
     "type": "flannel",
     "delegate": {
         "type": "calico",
         "etcd_endpoints": "https://10.79.218.2:2379,https://10.79.218.3:2379",
         "log_level": "none",
         "log_level_stderr": "info",
         "hostname": "10.79.218.2",
         "policy": {
             "type": "k8s",
             "k8s_api_root": "http://127.0.0.1:8080/api/v1/"
             }
         }
     }
 - path: "/etc/kubernetes/manifests/policy-controller.yaml"
   content: |
    apiVersion: v1
     kind: Pod
     metadata:
       name: calico-policy-controller
       namespace: calico-system
     spec:
       hostNetwork: true
       containers:
         # The Calico policy controller.
         - name: k8s-policy-controller
           image: calico/kube-policy-controller:v0.2.0
           env:
             - name: ETCD_ENDPOINTS
               value: "https://10.79.218.2:2379,https://10.79.218.3:2379"
             - name: K8S_API
               value: "http://127.0.0.1:8080"
             - name: LEADER_ELECTION
               value: "true"
         # Leader election container used by the policy controller.
         - name: leader-elector
           image: quay.io/calico/leader-elector:v0.1.0
           imagePullPolicy: IfNotPresent
           args:
             - "--election=calico-policy-election"
             - "--election-namespace=calico-system"
             - "--http=127.0.0.1:4040"
...
units:
 - name: calico-node.service
   enable: true
   command: start
   content: |
    [Unit]
    Description=Calico per-host agent
    Requires=network-online.target
    After=network-online.target

    [Service]
    Slice=machine.slice
    Environment=CALICO_DISABLE_FILE_LOGGING=true
    Environment=HOSTNAME=10.79.218.2
    Environment=IP=10.79.218.2
    Environment=FELIX_FELIXHOSTNAME=10.79.218.2
    Environment=CALICO_NETWORKING=false
    Environment=NO_DEFAULT_POOLS=true
    Environment=ETCD_ENDPOINTS=https://10.79.218.2:2379,https://10.79.218.3:2379
    ExecStart=/usr/bin/rkt run --inherit-env --stage1-from-dir=stage1-fly.aci \
   --volume=modules,kind=host,source=/lib/modules,readOnly=false \
   --mount=volume=modules,target=/lib/modules \
   --trust-keys-from-https quay.io/calico/node:v0.19.0

   KillMode=mixed
   Restart=always
   TimeoutStartSec=0

   [Install]
   WantedBy=multi-user.target

please ignore the space indentation.. i don't think I copy/paste it properly :)

when I try to start calico-node service I get the following error:

Sep 14 05:45:17 localhost systemd[1]: Started Calico per-host agent.
Sep 14 05:45:17 localhost rkt[1644]: image: using image from file /usr/lib64/rkt/stage1-images/stage1-fly.aci
Sep 14 05:45:18 localhost rkt[1644]: image: using image from local store for image name quay.io/calico/node:v0.19.0
Sep 14 05:45:25 localhost rkt[1644]: Traceback (most recent call last):
Sep 14 05:45:25 localhost rkt[1644]:   File "startup.py", line 292, in <module>
Sep 14 05:45:25 localhost rkt[1644]:     client = IPAMClient()
Sep 14 05:45:25 localhost rkt[1644]:   File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 228, in __init__
Sep 14 05:45:25 localhost rkt[1644]:     "%s" % (ETCD_CA_CERT_FILE_ENV, etcd_ca))
Sep 14 05:45:25 localhost rkt[1644]: pycalico.datastore_errors.DataStoreError: Invalid ETCD_CA_CERT_FILE. Certificate Authority cert is required and m
Sep 14 05:45:25 localhost rkt[1644]: Calico node failed to start
Sep 14 05:45:25 localhost systemd[1]: calico-node.service: Main process exited, code=exited, status=1/FAILURE
Sep 14 05:45:25 localhost systemd[1]: calico-node.service: Unit entered failed state.
Sep 14 05:45:25 localhost systemd[1]: calico-node.service: Failed with result 'exit-code'.
Sep 14 05:45:25 localhost systemd[1]: calico-node.service: Service hold-off time over, scheduling restart.
Sep 14 05:45:25 localhost systemd[1]: Stopped Calico per-host agent.
Sep 14 05:45:25 localhost systemd[1]: Started Calico per-host agent.
Sep 14 05:45:25 localhost rkt[1714]: image: using image from file /usr/lib64/rkt/stage1-images/stage1-fly.aci
Sep 14 05:45:26 localhost rkt[1714]: image: using image from local store for image name quay.io/calico/node:v0.19.0
Sep 14 05:45:28 localhost rkt[1714]: Traceback (most recent call last):
Sep 14 05:45:28 localhost rkt[1714]:   File "startup.py", line 292, in <module>
Sep 14 05:45:28 localhost rkt[1714]:     client = IPAMClient()
Sep 14 05:45:28 localhost rkt[1714]:   File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 228, in __init__
Sep 14 05:45:28 localhost rkt[1714]:     "%s" % (ETCD_CA_CERT_FILE_ENV, etcd_ca))
Sep 14 05:45:28 localhost rkt[1714]: pycalico.datastore_errors.DataStoreError: Invalid ETCD_CA_CERT_FILE. Certificate Authority cert is required and m

lines 2-25

so I get Invalid ETCD_CA_CERT_FILE.. I didn't really specify to calico what keys to use..so I guess i'm missing some configuration.

I have the following etc related keys at /etc/ssl/etcd

8 -rw-------. 1 etcd etcd 1050 Sep 14 05:45 ca.pem
8 -rw-------. 1 etcd etcd  289 Sep 14 05:45 etcd1-key.pem
8 -rw-------. 1 etcd etcd 1058 Sep 14 05:45 etcd1.pem
8 -rw-------. 1 etcd etcd  227 Sep 12 03:49 server1-key.pem
8 -rw-------. 1 etcd etcd  822 Sep 12 03:49 server1.pem

I tried adding Environment=ETCD_CA_CERT_FILE=/etc/ssl/etcd/ca.pem to the calico-node systemd file, but I get the exact same results.

any ideas ?

update

so I tried to run calico manually, not with systemd. and I also added all the required environment variables that calico requires

export CALICO_DISABLE_FILE_LOGGING=true
export HOSTNAME=10.79.218.2
export IP=10.79.218.2
export FELIX_FELIXHOSTNAME=10.79.218.2
export CALICO_NETWORKING=false
export NO_DEFAULT_POOLS=true
export ETCD_ENDPOINTS=https://10.79.218.2:2379,https://10.79.218.3:2379
export ETCD_AUTHORITY=10.79.218.2:2379
export ETCD_SCHEME=https
export ETCD_CA_CERT_FILE=/etc/ssl/etcd/ca.pem
export ETCD_CERT_FILE=/etc/ssl/etcd/etcd1.pem
export ETCD_KEY_FILE=/etc/ssl/etcd/etcd1-key.pem

when I try to execute the calico container with:

/usr/bin/rkt run --inherit-env --stage1-from-dir=stage1-fly.aci \
 --volume=modules,kind=host,source=/lib/modules,readOnly=false \
 --mount=volume=modules,target=/lib/modules \
 --trust-keys-from-https quay.io/calico/node:v0.19.0

I get

image: using image from file /usr/lib64/rkt/stage1-images/stage1-fly.aci
image: using image from local store for image name quay.io/calico/node:v0.19.0
Traceback (most recent call last):
  File "startup.py", line 292, in <module>
   client = IPAMClient()
  File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 221, in __init__
    ETCD_CERT_FILE_ENV, etcd_cert))
pycalico.datastore_errors.DataStoreError: Cannot read ETCD_KEY_FILE and/or ETCD_CERT_FILE. Both must be readable file paths. Values provided: ETCD_KEY_FILE=/etc/ssl/etcd/etcd1-key.pem, ETCD_CERT_FILE=/etc/ssl/etcd/etcd1.pem

I changed the file permissions of the certificate files to 666 but that doesn't resolve the issue. and I know that these certificates are valid because etcd tls works properly. so what am I missing?

update 2

It appears I was missing to mount the certificates directory on the calico container.

so now I'm running the calico container with

/usr/bin/rkt run --volume etcd-ssl,kind=host,source=/etc/ssl/etcd/,readOnly=true --inherit-env --stage1-from-dir=stage1-fly.aci  --volume=modules,kind=host,source=/lib/modules,readOnly=false  --mount=volume=modules,target=/lib/modules  --trust-keys-from-https quay.io/calico/node:v0.19.0 --mount volume=etcd-ssl,target=/etc/ssl/etcd

I get the following output:

image: using image from file /usr/lib64/rkt/stage1-images/stage1-fly.aci
image: using image from local store for image name quay.io/calico/node:v0.19.0
Traceback (most recent call last):
  File "startup.py", line 292, in <module>
client = IPAMClient()
  File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 246, in __init__
allow_reconnect=True)
  File "/usr/lib/python2.7/site-packages/etcd/client.py", line 204, in __init__
set(self.machines))
  File "/usr/lib/python2.7/site-packages/etcd/client.py", line 299, in machines
return self.machines
  File "/usr/lib/python2.7/site-packages/etcd/client.py", line 301, in machines
    raise etcd.EtcdException("Could not get the list of servers, "
etcd.EtcdException: Could not get the list of servers, maybe you provided the wrong host(s) to connect to?
Calico node failed to start

I'm a bit closer.. but still no solution.

update 3

I tried setting ETCD_ENDPOINTS to the etcd server on the coreos machine by running export ETCD_ENDPOINTS=https://10.79.218.2:2379, and now when i try to run the calico rkt image i get:

image: using image from file /usr/lib64/rkt/stage1-images/stage1-fly.aci
image: using image from local store for image name quay.io/calico/node:v0.19.0
Traceback (most recent call last):
  File "startup.py", line 295, in <module>
main()
  File "startup.py", line 251, in main
warn_if_hostname_conflict(ip)
  File "startup.py", line 192, in warn_if_hostname_conflict
current_ipv4, _ = client.get_host_bgp_ips(hostname)
  File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 132, in wrapped
"running?" % (fn.__name__, e.message))
pycalico.datastore_errors.DataStoreError: get_host_bgp_ips: Error accessing etcd (Connection to etcd failed due to SSLError(CertificateError("hostname '10.79.218.2' doesn't match u'etcd'",),)).  Is etcd running?
Calico node failed to start
ufk
  • 323
  • 3
  • 7
  • 26

2 Answers2

2

I also had this problem, and eventually found the source of the issue by looking at the code for the etcd connection logic and the libraries used, and some pointers from the Calico team in their Slack channel.

The problem is because the current version (at least up to 0.22.0) of Calico uses a Python etcd client which does not support IP SAN (Subject Alt Name) in TLS certificates. This means that the certificates you are using cannot be correctly associated with the etcd servers they are configured on.

This is described in this GitHub issue.

To fix this, you must either wait until a new release of the urllib library is made, it is picked up by the etcd client, and a new release of that is made, and Calico is updated to use the new etcd client. Alternatively, you can re-generate the certificates using FQDNs instead of IP addresses in the SAN fields. This means you will need make sure your servers are accessible through those names, either using DNS or setting /etc/hosts correctly. The OpenSSL configuration for generating certificates should contain something like this:

[alt_names]
DNS.1 = $ENV::FQDN

The link describing how you generated the certificates uses CFSSL so I suggest reading its documentation on how to change to using hostnames instead of IP addresses. I believe it may be as simple as modifying the JSON configuration as follows:

"hosts": [
    "example.com",
    "www.example.com"
],
grkvlt
  • 186
  • 4
0

I find that with this flaky library I can succeed if: the client opens a connection to an IP address; the server's cert asserts that IP address in the Subject; and the server's cert does NOT have any DNS type entries in the Subject Alternative Name list. Following is selected output from openssl x509 -text ... for an example server cert that works when the client opens a connection using the IP address 10.10.10.1 to identify the server:

...
        Subject: CN=10.10.10.1
...
        X509v3 extensions:
            X509v3 Basic Constraints: 
                CA:FALSE
            X509v3 Key Usage: 
                Digital Signature, Non Repudiation, Key Encipherment
            X509v3 Subject Alternative Name: 
                IP Address:100.127.0.2, IP Address:100.127.0.2, IP Address:10.10.10.1
...

Also, there are newer versions of the Calico images. I have heard only two bad things about calico/node:v0.23.0. One is from someone else --- https://calicousers.slack.com/archives/kubernetes/p1478206011002345 . I have done some testing of that image myself and hand only one problem, https://github.com/projectcalico/calico-containers/issues/1107 . There are v1.0.0 betas and an rc1 right now, I have not heard bad things about them.