Cyborg -- nova is failing to start due to issues getting mdev information
We are seeing an issue with nova-compute starting on gpu nodes in dev due to changes to how libvirt pulls the mdev device name. Below is the error we are seeing. This appears to be bug that has been reported and changes have been made to accommodate the update that libvirt did.
https://review.opendev.org/c/openstack/nova/+/838976
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager [None req-ce69e00d-9216-4b3b-a71d-3ef7351fba69 - - - - - -] Error updating resources for node gpu-7dbcc338-636c-5635-9d8d-476a7473b76c.: ValueError: badly formed hexadecimal UUID string
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager Traceback (most recent call last):
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 10097, in _update_available_resource_for_node
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager self.rt.update_available_resource(context, nodename,
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/nova/compute/resource_tracker.py", line 900, in update_available_resource
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager self._update_available_resource(context, resources, startup=startup)
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py", line 391, in inner
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager return f(*args, **kwargs)
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/nova/compute/resource_tracker.py", line 1007, in _update_available_resource
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager self._update(context, cn, startup=startup)
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/nova/compute/resource_tracker.py", line 1285, in _update
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager self._update_to_placement(context, compute_node, startup)
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/retrying.py", line 49, in wrapped_f
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager return Retrying(*dargs, **dkw).call(f, *args, **kw)
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/retrying.py", line 206, in call
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager return attempt.get(self._wrap_exception)
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/retrying.py", line 247, in get
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager six.reraise(self.value[0], self.value[1], self.value[2])
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/six.py", line 719, in reraise
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager raise value
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/retrying.py", line 200, in call
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/nova/compute/resource_tracker.py", line 1221, in _update_to_placement
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager self.driver.update_provider_tree(prov_tree, nodename)
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 8641, in update_provider_tree
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager self._update_provider_tree_for_vgpu(
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 9044, in _update_provider_tree_for_vgpu
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager inventories_dict = self._get_gpu_inventories()
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 7852, in _get_gpu_inventories
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager count_per_parent = self._count_mediated_devices(enabled_mdev_types)
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 7793, in _count_mediated_devices
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager mediated_devices = self._get_mediated_devices(types=enabled_mdev_types)
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 8049, in _get_mediated_devices
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager device = self._get_mediated_device_information(name)
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 8030, in _get_mediated_device_information
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager "uuid": libvirt_utils.mdev_name2uuid(cfgdev.name),
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/nova/virt/libvirt/utils.py", line 587, in mdev_name2uuid
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager return str(uuid.UUID(mdev_name[5:].replace('_', '-')))
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager File "/usr/lib/python3.8/uuid.py", line 171, in __init__
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager raise ValueError('badly formed hexadecimal UUID string')
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager ValueError: badly formed hexadecimal UUID string
2022-09-28 11:41:00.426 4210 ERROR nova.compute.manager