πŸ–₯️ Installing Kubernetes β€” Phase 2

illustration

After installing Ceph and Kubernetes, In this article, I continue with the installation of Kubernetes and OpenStack ☁️.

This turned out to be a journey with quite a few hurdles😀 along the way.

#Kubernetes #digitalsovereignty #OpenStack #IPv6 #CloudComputing #DevOps #Homelab #ARM


Planning

OpenStack exposes several endpoints to make both the dashboard and APIs available. These HTTP endpoints are secured using TLS.

The endpoints can be generated using the following Ansible command, as described in the Atmosphere documentation:

ansible-playbook -e domain_name="yourdomain.tld" \
                 -e workspace_path="$(pwd)/cloud-config/inventory" \
                 vexxhost.atmosphere.generate_workspace

For the domain name, I used infra.os.domain.tld. Later, I changed the dashboard address to os.domain.tld. This keeps the GUI URL clean and also allows me, in theory, to use a wildcard TLS certificate for the remaining APIs.

The vexxhost.atmosphere.generate_workspace playbook generates several configuration files, which form a solid base for the installation.

This playbook generates:

Afterwards, I started making adjustments to support the desired setup, such as OVN, IPv6, and other custom settings.


TLS

Atmosphere supports multiple TLS providers.

Possible sollutions would be:

Although my cluster is only accessible privately, I still wanted to use a public certificate to simplify management.

The advantage of a public certificate is that it is already trusted by the default root certificate store. This avoids the need to create and distribute a private CA.

Additionally, Python libraries such as certifi do not work well with private CAs.

This excludes the following solutions:

In my setup, I use a hidden PowerDNS master.

I considered implementing acme-dns into Atmosphere and using the hosted service at https://acme-dns.io/. The advantage is that it only requires an API key and a CNAME pointing to the acme-dns.io service.

However, the disadvantages are:

This approach could work if only a single domain is used for Horizon and a wildcard certificate is used for the rest, for example:

Unfortunately, Atmosphere does not support acme-dns or ACME wildcards.

This leaves RFC2136 as the only viable solution.


PowerDNS Setup

To enable DNS updates on the PowerDNS server, I enabled RFC2136 by setting the following option in /etc/powerdns/pdns.conf:

dnsupdate=yes

Next, ACLs and keys must be configured to allow zone updates.

1. Configure the update ACL

root@pdns:~# pdnsutil set-meta example.tld ALLOW-DNSUPDATE-FROM 3fff:0:0:40::/64
Set 'example.tld' meta ALLOW-DNSUPDATE-FROM = 3fff:0:0:40::/64

2. Create a TSIG key

root@pdns:~# pdnsutil generate-tsig-key example hmac-sha512
Create new TSIG key example hmac-sha512 XXXXXXXXXXXXXXXXXXXXXXXX

3. Import the key

root@pdns:~# pdnsutil import-tsig-key example hmac-sha512 'XXXXXXXXXXXXXXXXXXXXXXXX'
Imported TSIG key example hmac-sha512

4. Assign the key to the zone

root@pdns:~# pdnsutil set-meta example.tld TSIG-ALLOW-DNSUPDATE example
Set 'example.tld' meta TSIG-ALLOW-DNSUPDATE = example

After creating and assigning the key, Atmosphere can be configured with the appropriate settings:

cluster_issuer_type: acme
cluster_issuer_acme_solver: rfc2136

cluster_issuer_acme_email: email@address.com
cluster_issuer_acme_rfc2136_nameserver: "ns.{{ tld }}"
cluster_issuer_acme_rfc2136_tsig_algorithm: HMACSHA512
cluster_issuer_acme_rfc2136_tsig_key_name: sapiolab
cluster_issuer_acme_rfc2136_tsig_secret_key: XXXXXXXXXXXXXXXXXXXXXXXX

Kubernetes Networking

Connectivity between Kubernetes nodes is handled by Cilium. Atmosphere configures Cilium to use VXLAN encapsulation. However, after generating traffic, I noticed that no traffic was passing through.

The root cause is that older versions of Cilium do not support IPv6-only networks.

Two potential solutions were considered:

After troubleshooting both approaches, upgrading Cilium turned out to be the easiest solution. IPv6-only networking support was fixed starting from Cilium v1.18.0.

In the ansible-collection-kubernetes playbook, I had to specify the underlay-protocol. This was not supported by the Cilium Helm chart used by Atmosphere at the time and was manually backported.

Additionally, the kube-vip CIDR was incorrectly configured for IPv6. It should be /128.


Cilium and Multiple Default Gateways

Cilium does not handle nodes with multiple default gateways well, as it cannot determine which interface should bind the VXLAN port.

To resolve this, I explicitly configured the devices:

cilium_helm_values:
  devices: - lo
  nodePort:
    directRoutingDevice: lo

Kubernetes Upgrade

After upgrading Kubernetes to version 1.35.0, overall stability and behavior improved.


Percona PXC Cluster

Atmosphere uses a Percona PXC cluster for its databases. During setup, the database entered an error state and failed to synchronize.

Issues observed:

To troubleshoot this, the following script was extremely helpful: https://github.com/rtulke/portcheck/blob/main/pcng.sh

HAProxy

HAProxy pods were not listening on IPv6 sockets.
This was fixed by changing the bind address from * to ::.


NGINX Ingress

The default backend image for ingress-nginx lacks ARM support.

Switching to:

docker.io/dyrnq/defaultbackend-arm64:1.5

resolved the issue.


RabbitMQ and Keystone

After enabling IPv6 in the RabbitMQ configuration, connectivity improved:

rabbitmq_spec:
  rabbitmq:
    erlangInetConfig: |
      {inet6, true}.
    envConfig: |
      SERVER_ADDITIONAL_ERL_ARGS="-kernel inetrc '/etc/rabbitmq/erl_inetrc' -proto_dist inet6_tcp"
      RABBITMQ_CTL_ERL_ARGS="-proto_dist inet6_tcp"

However, the management port still only listens on IPv4:

beam.smp 11 rabbitmq 0.0.0.0:15672 0.0.0.0:0 LISTEN

Open Issues

You can follow me on Mastodon: @mpiscaer@mastodon.social Or send me a matrix message on @michiel:piscaer.com