Producent oprogramowania do wirtualizacji VMware opublikował właśnie kolejną poprawkę dla VMware ESXi oznaczoną numerem wersji 7.0 Update 3c. W najnowszej aktualizacji rozwiązano wiele problemów powodujących zatrzymanie hosta, którym towarzyszył wyświetlający się „blue screen”. W aktualizacji rozwiązano także problem braku synchronizacji NTP oraz błędu z zapisaniem ustawień serwerów czasu. Naprawiono również błąd naprawy klastrów przez vSphere Lifecycle Manager, który zajmował bardzo dużo czasu. Po więcej ciekawych informacji zachęcamy do przeczytania dalszej części artykułu.
Rozwiązane problemy:
Networking Issues
- If you use a vSphere Distributed Switch (VDS) of version earlier than 6.6 and change the LAG hash algorithm, ESXi hosts might fail with a purple diagnostic screenIf you use a VDS of version earlier than 6.6 on a vSphere 7.0 Update 1 or later system, and you change the LAG hash algorithm, for example from L3 to L2 hashes, ESXi hosts might fail with a purple diagnostic screen.This issue is resolved in this release.
- You see packet drops for virtual machines with VMware Network Extensibility (NetX) redirection enabledIn vCenter Server advanced performance charts, you see an increasing number of packet drop count for all virtual machines that have NetX redirection enabled. However, if you disable NetX redirection, the count becomes 0.This issue is resolved in this release.
- An ESXi host might fail with a purple diagnostic screen during booting due to incorrect CQ to EQ mapping in an Emulex FC HBAIn rare cases, incorrect mapping of completion queues (CQ) when the total number of I/O channels of a Emulex FC HBA is not an exact multiple of the number of event queues (EQ), might cause booting of an ESXi host to fail with a purple diagnostic screen. In the backtrace, you can see an error in the
lpfc_cq_create()
method.This issue is resolved in this release. The fix ensures correct mapping of CQs to EQs. - ESXi hosts might fail with a purple diagnostic screen due to memory allocation issue in the UNIX domain socketsDuring internal communication between UNIX domain sockets, a heap allocation might occur instead of cleaning ancillary data such as file descriptors. As a result, in some cases, the ESXi host might report an out of memory condition and fail with a purple diagnostic screen with
#PF Exception 14
and errors similar toUserDuct_ReadAndRecvMsg()
.This issue is resolved in this release. The fix cleans ancillary data to avoid buffer memory allocations. - NTP optional configurations do not persist on ESXi host rebootWhen you set up optional configurations for NTP by using ESXCLI commands, the settings might not persist after the ESXi host reboots.This issue is resolved in this release. The fix makes sure that optional configurations are restored into the local cache from ConfigStore during ESXi host bootup.
- When you change the LACP hashing algorithm in systems with vSphere Distributed Switch of version 6.5.0, multiple ESXi hosts might fail with a purple diagnostic screenIn systems with vSphere Distributed Switch of version 6.5.0 and ESXi hosts of version 7.0 or later, when you change the LACP hashing algorithm, this might cause an unsupported LACP event error due to a temporary string array used to save the event type name. As a result, multiple ESXi hosts might fail with a purple diagnostic screen.This issue is resolved in this release. To avoid facing the issue, in vCenter Server systems of version 7.0 and later make sure you use a vSphere Distributed Switch version later than 6.5.0.
Installation, Upgrade and Migration Issues
- Remediation of clusters that you manage with vSphere Lifecycle Manager baselines might take longRemediation of clusters that you manage with vSphere Lifecycle Manager baselines might take long after updates from ESXi 7.0 Update 2d and earlier to a version later than ESXi 7.0 Update 2d.This issue is resolved in this release.
- After updating to ESXi 7.0 Update 3, virtual machines with physical RDM disks fail to migrate or power-on on destination ESXi hostsIn certain cases, for example virtual machines with RDM devices running on servers with SNMP, a race condition between device open requests might lead to failing vSphere vMotion operations.This issue is resolved in this release. The fix makes sure that device open requests are sequenced to avoid race conditions. For more information, see VMware knowledge base article 86158.
- After upgrading to ESXi 7.0 Update 2d and later, you see an NTP time sync errorIn some environments, after upgrading to ESXi 7.0 Update 2d and later, in the vSphere Client you might see the error
Host has lost time synchronization
. However, the alarm might not indicate an actual issue.This issue is resolved in this release. The fix replaces the error message with a log function for backtracing but prevents false alarms.
- ESXi hosts with virtual machines with Latency Sensitivity enabled might randomly become unresponsive due to CPU starvation When you enable Latency Sensitivity on virtual machines, some threads of the Likewise Service Manager (lwsmd), which sets CPU affinity explicitly, might compete for CPU resources on such virtual machines. As a result, you might see the ESXi host and the hostd service to become unresponsive.This issue is resolved in this release. The fix makes sure lwsmd does not set CPU affinity explicitly.
- In very rare cases, the virtual NVME adapter (VNVME) retry logic in ESXi 7.0 Update 3 might potentially cause silent data corruptionThe VNVME retry logic in ESXi 7.0 Update 3 has an issue that might potentially cause silent data corruption. Retries rarely occur and they can potentially, not always, cause data errors. The issue affects only ESXi 7.0 Update 3.This issue is resolved in this release.
- ESXi hosts might fail with a purple diagnostic screen during shutdown due to stale metadataIn rare cases, when you delete a large component in an ESXi host, followed by a reboot, the reboot might start before all metadata of the component gets deleted. The stale metadata might cause the ESXi host to fail with a purple diagnostic screen.This issue is resolved in this release. The fix makes sure no pending metadata remains before a reboot of ESXi hosts.
- Virtual desktop infrastructure (VDI) might become unresponsive due to a race condition in the VMKAPI driverEvent delivery to applications might delay indefinitely due to a race condition in the VMKAPI driver. As a result, the virtual desktop infrastructure in some environments, such as systems using NVIDIA graphic cards, might become unresponsive or lose connection to the VDI client.This issue is resolve in this release.
- ESXi hosts might fail with a purple diagnostic screen due to issues with ACPI Component Architecture (ACPICA) semaphoresSeveral issues in the implementation of ACPICA semaphores in ESXi 7.0 Update 3 and earlier can result in VMKernel panics, typically during boot. An issue in the semaphore implementation can cause starvation, and on several call paths the VMKernel might improperly try to acquire an ACPICA semaphore or to sleep within ACPICA while holding a spinlock. Whether these issues cause problems on a specific machine depends on details of the ACPI firmware of the machine.These issues are resolved in this release. The fix involves a rewrite of the ACPICA semaphores in ESXi, and correction of the code paths that try to enter ACPICA while holding a spinlock.
- ESXi hosts might fail with a purple diagnostic screen when I/O operations run on a software iSCSI adapterI/O operations on a software iSCSI adapter might cause a rare race condition inside the
iscsi_vmk
driver. As a result, ESXi hosts might intermittently fail with a purple diagnostic screen.This issue is resolved in this release.
- Update to OpenSSLThe OpenSSL package is updated to version openssl-1.0.2zb.
- Update to the Python packageThe Python package is updated to address CVE-2021-29921.
- You can connect to port 9080 by using restricted DES/3DES ciphersWith the OPENSSL command
openssl s_client -cipher <CIPHER> -connect localhost:9080
you can connect to port 9080 by using restricted DES/3DES ciphers.This issue is resolved in this release. You cannot connect to port 9080 by using the following ciphers: DES-CBC3-SHA, EDH-RSA-DES-CBC3-SHA, ECDHE-RSA-DES-CBC3-SHA, and AECDH-DES-CBC3-SHA. - The following VMware Tools ISO images are bundled with ESXi 7.0 Update 3c:
windows.iso
: VMware Tools 11.3.5 supports Windows 7 SP1 or Windows Server 2008 R2 SP1 and later.linux.iso
: VMware Tools 10.3.23 ISO image for Linux OS with glibc 2.11 or later.
The following VMware Tools ISO images are available for download:
- VMware Tools 11.0.6:
windows.iso
: for Windows Vista (SP2) and Windows Server 2008 Service Pack 2 (SP2).
- VMware Tools 10.0.12:
winPreVista.iso
: for Windows 2000, Windows XP, and Windows 2003.linuxPreGLibc25.iso
: supports Linux guest operating systems earlier than Red Hat Enterprise Linux (RHEL) 5, SUSE Linux Enterprise Server (SLES) 11, Ubuntu 7.04, and other distributions with glibc version earlier than 2.5.
solaris.iso
: VMware Tools image 10.3.10 for Solaris.
darwin.iso
: Supports Mac OS X versions 10.11 and later.
- Virtual machines appear as inaccessible in the vSphere Client and you might see some downtime for applicationsIn rare cases, hardware issues might cause an SQlite DB corruption that makes multiple VMs become inaccessible and lead to some downtime for applications.This issue is resolved in this release.
- Virtual machine operations fail with an error for insufficient disk space on datastoreA new datastore normally has a high number of large file block (LFB) resources and a lesser number of small file block (SFB) resources. For workflows that consume SFBs, such as virtual machine operations, LFBs convert to SFBs. However, due to a delay in updating the conversion status, newly converted SFBs might not be recognized as available for allocation. As a result, you see an error such as
Insufficient disk space on datastore
when you try to power on, clone, or migrate a virtual machine.This issue is resolved in this release. - vSphere Virtual Volume snapshot operations might fail on the source volume or the snapshot volume on Pure storageDue to an issue that allows the duplication of the unique ID of vSphere Virtual Volumes, virtual machine snapshot operations might fail, or the source volume might get deleted. The issue is specific to Pure storage and affects Purity release lines 5.3.13 and earlier, 6.0.5 and earlier, and 6.1.1 and earlier.This issue is resolved in this release.
- You might see vSAN health errors for cluster partition when data-in-transit encryption is enabledIn the vSphere Client, you might see vSAN health errors such as
vSAN cluster partition
or vSAN object health
when data-in-transit encryption is enabled. The issue occurs because when a rekey operation starts in a vSAN cluster, a temporary resource issue might cause key exchange between peers to fail.This issue is resolved in this release.
Virtual Machine Management Issues
- A race condition between live migration operations might cause the ESXi host to fail with a purple diagnostic screenIn environments with VMs of 575 GB or more reserved memory that do not use Encrypted vSphere vMotion, a live migration operation might race with another live migration and cause the ESXi host to fail with a purple diagnostic screen.This issue is resolved in this release. However, in very rare cases, the migration operation might still fail, regardless that the root cause for the purple diagnostic screen condition is fixed. In such cases, retry the migration when no other live migration is in progress on the source host, or enable Encrypted vSphere vMotion on the virtual machines.
Resolved Issues from Previous Releases
Networking Issues
- RDMA traffic by using the iWARP protocol might not completeRDMA traffic by using the iWARP protocol on Intel x722 cards might time out and not complete.This issue is resolved in this release.
Installation, Upgrade and Migration Issues
- The /locker partition might be corrupted when the partition is stored on a USB or SD deviceDue to the I/O sensitivity of USB and SD devices, the VMFS-L locker partition on such devices that stores VMware Tools and core dump files might get corrupted.This issue is resolved in this release. By default, ESXi loads the locker packages to the RAM disk during boot.
- ESXi hosts might lose connectivity after brcmfcoe driver upgrade on Hitachi storage arraysAfter an upgrade of the
brcmfcoe
driver on Hitachi storage arrays, ESXi hosts might fail to boot and lose connectivity.This issue is resolved in this release. - After upgrading to ESXi 7.0 Update 2, you see excessive storage read I/O loadESXi 7.0 Update 2 introduced a system statistics provider interface that requires reading the datastore stats for every ESXi host on every 5 min. If a datastore is shared by multiple ESXi hosts, such frequent reads might cause a read latency on the storage array and lead to excessive storage read I/O load.This issue is resolved in this release.
Virtual Machine Management Issues
- Virtual machines with enabled AMD Secure Encrypted Virtualization-Encrypted State (SEV-ES) cannot create Virtual Machine Communication Interface (VMCI) socketsPerformance and functionality of features that require VMCI might be affected on virtual machines with enabled AMD SEV-ES, because such virtual machines cannot create VMCI sockets.This issue is resolved in this release.
- Virtual machines might fail when rebooting a heavily loaded guest OSIn rare cases, when a guest OS reboot is initiated outside the guest, for example from the vSphere Client, virtual machines might fail, generating a VMX dump. The issue might occur when the guest OS is heavily loaded. As a result, responses from the guest to VMX requests are delayed prior to the reboot. In such cases, the
vmware.log
file of the virtual machines includes messages such as:I125: Tools: Unable to send state change 3: TCLO error. E105: PANIC: NOT_REACHED bora/vmx/tools/toolsRunningStatus.c:953.
This issue is resolved in this release.
Miscellaneous Issues
- Asynchronous read I/O containing a SCATTER_GATHER_ELEMENT array of more than 16 members with at least 1 member falling in the last partial block of a file might lead to ESXi host panicIn rare cases, in an asynchronous read I/O containing a
SCATTER_GATHER_ELEMENT
array of more than 16 members, at least 1 member might fall in the last partial block of a file. This might lead to corrupting VMFS memory heap, which in turn causes ESXi hosts to fail with a purple diagnostic screen.This issue is resolved in this release. - If a guest OS issues UNMAP requests with large size on thin provisioned VMDKs, ESXi hosts might fail with a purple diagnostic screenESXi 7.0 Update 3 introduced an uniform UNMAP granularity for VMFS and SEsparse snapshots, and set the maximum UNMAP granularity reported by VMFS to 2GB. However, in certain environments, when the guest OS makes a trim or unmap request of 2GB, such a request might require the VMFS metadata transaction to do lock acquisition of more than 50 resource clusters. VMFS might not handle such requests correctly. As a result, an ESXi host might fail with a purple diagnostic screen. VMFS metadata transaction requiring lock actions on greater then 50 resource clusters is rare and can only happen on aged datastores. The issue impacts only thin-provisioned VMDKs. Thick and eager zero thick VMDKs are not impacted.
Along with the purple diagnostic screen, in the/var/run/log/vmkernel
file you see errors such as:
2021-10-20T03:11:41.679Z cpu0:2352732)@BlueScreen: NMI IPI: Panic requested by another PCPU. RIPOFF(base):RBP:CS [0x1404f8(0x420004800000):0x12b8:0xf48] (Src 0x1, CPU0)
This issue is resolved in this release.
2021-10-20T03:11:41.689Z cpu0:2352732)Code start: 0x420004800000 VMK uptime: 11:07:27:23.196
2021-10-20T03:11:41.697Z cpu0:2352732)Saved backtrace from: pcpu 0 Heartbeat NMI
2021-10-20T03:11:41.715Z cpu0:2352732)0x45394629b8b8:[0x4200049404f7]HeapVSIAddChunkInfo@vmkernel#nover+0x1b0 stack: 0x420005bd611e - The hostd service might fail due to a time service event monitoring issueAn issue in the time service event monitoring service, which is enabled by default, might cause the hostd service to fail. In the
vobd.log
file, you see errors such as:
2021-10-21T18:04:28.251Z: [UserWorldCorrelator] 304957116us: [esx.problem.hostd.core.dumped] /bin/hostd crashed (1 time(s) so far) and a core file may have been created at /var/core/hostd-zdump.000. This may have caused connections to the host to be dropped.
.This issue is resolved in this release.
2021-10-21T18:04:28.251Z: An event (esx.problem.hostd.core.dumped) could not be sent immediately to hostd; queueing for retry. 2021-10-21T18:04:32.298Z: [UserWorldCorrelator] 309002531us: [vob.uw.core.dumped] /bin/hostd(2103800) /var/core/hostd-zdump.001
2021-10-21T18:04:36.351Z: [UserWorldCorrelator] 313055552us: [vob.uw.core.dumped] /bin/hostd(2103967) /var/core/hostd-zdump.002
Znane problemy:
Networking Issues
- Stale NSX for vSphere properties in vSphere Distributed Switch 7.0 (VDS) or ESXi 7.x hosts might fail host updatesIf you had NSX for vSphere with VXLAN enabled on a vSphere Distributed Switch (VDS) of version 7.0 and migrated to NSX-T Data Center by using NSX V2T migration, stale NSX for vSphere properties in the VDS or some hosts might prevent ESXi 7.x hosts updates. Host update fails with a platform configuration error.Workaround: Upload the
CleanNSXV.py
script to the/tmp
dir in vCenter Server. Log in to the appliance shell as a user with super administrative privileges (for example,root
) and follow these steps:- Run
CleanNSXV.py
by using the commandPYTHONPATH=$VMWARE_PYTHON_PATH python /tmp/CleanNSXV.py --user <vc_admin_user> --password <passwd>
. The<vc_admin_user>
parameter is a vCenter Server user with super administrative privileges and<passwd>
parameter is the user password.
For example:
PYTHONPATH=$VMWARE_PYTHON_PATH python /tmp/CleanNSXV.py --user 'administrator@vsphere.local' --password 'Admin123'
- Verify if the following NSX for vSphere properties,
com.vmware.netoverlay.layer0
andcom.vmware.net.vxlan.udpport,
are removed from the ESXi hosts:- Connect to a random ESXi host by using an SSH client.
- Run the command
net-dvs -l | grep "com.vmware.netoverlay.layer0\|com.vmware.net.vxlan.udpport"
.
If you see no output, then the stale properties are removed.
To download the
CleanNSXV.py
script and for more details, see VMware knowledge base article 87423. - Run
- The cURL version in ESXi650-202110001 and ESXi670-202111001 is later than the cURL version in ESXi 7.0 Update 3cThe cURL version in ESXi 7.0 Update 3c is 7.77.0, while ESXi650-202110001 and ESXi670-202111001 have the newer fixed version 7.78.0. As a result, if you upgrade from ESXi650-202110001 or ESXi670-202111001 to ESXi 7.0 Update 3c, cURL 7.7.0 might expose your system to the following vulnerabilities:
CVE-2021-22926: CVSS 7.5
CVE-2021-22925: CVSS 5.3
CVE-2021-22924: CVSS 3.7
CVE-2021-22923: CVSS 5.3
CVE-2021-22922: CVSS 6.5Workaround: None. cURL version 7.78.0 comes with a future ESXi 7.x release.
Notatki producenta: VMware ESXi 7.0 Update 3c
Pozdrawiamy,
Zespół B&B
Bezpieczeństwo w biznesie