QorePortal was not available because of issue with KeyVaults and POA Windows updates.
The Patch Orchestration Application (a wrapper for the Azure Service Fabric Repair Manager service) has frequency configuration to install Windows updates every Wednesday. During this update, all Service Fabric nodes are restarted. After this restart, some of QorePortal applications didn`t start. After an investigation, an issue with KeyVaults was found and resolved. And the nodes with these applications had to be restarted. But some of the nodes were unable to restart and just stuck with the ‘updating' status. In the background, POA was still updating the cluster and many nodes were in an unhealthy state (updating). Total CPU usage increased and SF created more nodes one by one, but it was all stuck on the 'creating’ status. Even after manually decreasing the number of nodes, the status of the nodes changed to "delete", but still frozen. Accordingly, the healthy nodes had high CPU usage and services cannot start well.
The root cause is not clear for sure, but it is related to overlap KeyVaults issue and POA Windows updates.
The ticket with Azure support was created and we received the next information:
Recommendations:
According to the Azure support recommendations new cluster was created and all application are deployed on the new cluster. After switching DNS mapping to the new cluster services started to work and synchronize. Than QorePortal was available and working normally.
QorePortal was temporarily unavailable and a new cluster was created.
We quickly resolved KeyVaults issue and deployed new cluster as Azure support recommended.
It took some time to deploy a new cluster with all applications.