HomeTech PlusTECH & OTHER NEWSMicrosoft: Here's what caused our recent Azure cloud-computing services outage

Microsoft: Here’s what caused our recent Azure cloud-computing services outage

Microsoft has revealed the root cause of the recent outage affecting Azure, which lasted about an hour and was due to a surge in Domain Name System (DNS) requests coupled with a code defect. 

Users were reporting that Azure Portal, Azure Services, Dynamics 365, and Xbox Live were inaccessible during the worldwide outage between 21:21 UTC and 22:00 UTC on 1 Apr 2021. Microsoft said in its root cause analysis report that the majority of services recovered by 22:30 UTC. 

While Microsoft quickly confirmed the outage was related to its DNS capabilities, the company’s final root cause analysis published April 4 sheds a bit more light on the cause being a previously unseen code defect in its DNS service that was triggered by excessive DNS client retries. 

SEE: Office 365: A guide for tech and business leaders (free PDF) (TechRepublic)

“Azure DNS servers experienced an anomalous surge in DNS queries from across the globe targeting a set of domains hosted on Azure,” Microsoft states.

“Normally, Azure’s layers of caches and traffic shaping would mitigate this surge. In this incident, one specific sequence of events exposed a code defect in our DNS service that reduced the efficiency of our DNS Edge caches.”

Microsoft’s DNS service was swamped as DNS clients retried requests, which added further pressure on the service. Microsoft notes DNS client retries are considered legitimate DNS traffic, so this traffic was not dropped by Microsoft’s volumetric mitigation systems, in turn reducing the availability of its DNS service across multiple regions. 

Microsoft says it mitigated the issue by updating the logic on the volumetric spike mitigation system to protect the DNS service from excessive client retries.    

The technology giant apologized to affected customers and explained that it had repaired the code defect to handle all requests efficiently in the cache. It has also improved automatic detection and mitigation of anomalous traffic patterns. 

This latest outage was not as lengthy as its 14-hour Azure outage in mid-March, which was attributed to an error that occurred in the rotation of keys used to support Azure AD’s use of OpenID.    

Enterprise Software

By ZDNet Source Link

Technology For You
Technology For Youhttps://www.technologyforyou.org
Technology For You - One of the Leading Online TECHNOLOGY NEWS Media providing the Latest & Real-time news on Technology, Cyber Security, Smartphones/Gadgets, Apps, Startups, Careers, Tech Skills, Web Updates, Tech Industry News, Product Reviews and TechKnowledge...etc. Technology For You has always brought technology to the doorstep of the Industry through its exclusive content, updates, and expertise from industry leaders through its Online Tech News Website. Technology For You Provides Advertisers with a strong Digital Platform to reach lakhs of people in India as well as abroad.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

spot_img
spot_img

CYBER SECURITY NEWS

TECH NEWS

TOP NEWS

TECH NEWS & UPDATES