Kinetic Upgrade
In this upgrade we have done the following updates/upgrades
- A significant re-write of the orchestration system that touches every endpoint and the way that data gets tracked across the cluster. In a general sense, we have:
- Stopped relying on fire-and-forget events and instead rely on grains stored in the mine (which can be queried at any time) to determine when it is safe for orchestration to proceed.
- Stopped using a single salt-run process for every endpoint. Instead, each endpoint type gets its own salt-run process (with memory savings of about 90%).
- Start using in-memory dictionaries instead of ugly hacks with jinja variables and salt modules for simple things.
- Get closer to true multi-master with all endpoint types. (No longer track spawning for virtual endpoints to avoid service creation race conditions, etc.)
- QoL coding style fixes/standardization (There is a much bigger issue to be resolved here relating to taking full advantage of code-reuse opportunities).
- Standardized shipped version of MariaDB
- Added redfish support and removed the use of ipmitool
- Networking re-write that changes from state.network to networkd
- Added a pyghmi fallback to set_bootonce to support old firmware that does not support BootSourceOverrideMode in redfish
- Added Support for CentOS 8
- Added support for Ubuntu 20.04 (Focal)
- Added support for OpenStack Ussuri
- Added support for Ceph Octopus
- Added support for Danos
- Added support for FRR (future implementation of BGP for an IP Fabric Underlay)
- Added sane quota limits for Barbican
- Increased Memory and Max Connections because 64mb for memcache is too low for a reasonable sized cloud. Increase to 2048Mb on memcached
- Set reserve host memory for computes in order to prevent OOM killers when spawning to many instances on computes
- Added tunables to control nova console ttls, horizon session expiry, and overall keystone fernet lifetime
- Added maintenance routine to allow for self healing of cache to remove expired packages and partial downloads
- Added a nova cell update routine to detect changes to rabbitmq are made in nova.conf
- Improved performance OVN by increasing the election timers and reducing the number of workers.
bug fixes:
- ref: https://bugs.launchpad.net/horizon/+bug/1880188
- ref: saltstack/salt#56124
- ref: https://bugs.launchpad.net/neutron/+bug/1893656
- ref: https://bugzilla.redhat.com/show_bug.cgi?id=1805288
- ref: https://bugzilla.redhat.com/show_bug.cgi?id=1874086
- removed a patch for https://bugs.launchpad.net/zun-ui/+bug/1797285 as its now in upstream code
- removed a patch for https://bugs.launchpad.net/zun/+bug/1762511 as its now in upstream code
- removed a patch to set an upper constraint for which python-zunclient is installed on zun-ui for communication with zun-api which caused a
- fixed a bug where Etcd spawning 0 service is not running after full orchestration
- improved how the salt custom module for cryptography.fernet generated fernet keys on keystone, the salt custom module method checked all minions for the presence of the library, which was not ideal.
- fxed a bug with multiple nfs-ganesha servers when using UCA
- Fixed a bug with neutron endpoint deps that did not differentiate between linuxbridge or ovn backend
- Fixed a bug with rabbitmq. Added a nova cell update routine to detect changes to rabbitmq changes from initial rebuild after nova.conf and neutron.conf have been updated with the new transport URL. Followed by a service restart on neutron and nova.
- Fixed an issue with interface assignments. Interfaces now get assigned by the specified interface value in the pillar (e.g. using ens3 in the pillar will ensure that the bridge gets assigned to pci slot 0x03). Interface order no longer matters.