On Feb. 29, many users of Microsoft cloud computing service Windows Azure found their systems unavailable, and, for some, the outage continued into the next day. Microsoft has apologized, issued refunds to affected customers and promised to learn from the incident.
The company says the problems were the result of a “Leap Day bug,” an error related to date/time values. In a blog post, Bill Laing, vice president of Microsoft’s Server and Cloud Division, wrote that the problem emerged from the system’s attempt to create “valid-to” dates one year in the future, which Azure figured would be February 29, 2013. Since that day doesn’t exist, the certification creation failed, and users ended up being shut out of their cloud systems.
Then, Laing wrote, Microsoft inadvertently sent out an update package that wasn’t compatible with some companies’ host agents, which meant a delay in getting back to business.
The issue occurred at a time when many businesses are considering whether to go the cloud computing route, and for what operations. Azure is a prominent name in the space, along with products from Amazon, Google and other companies.
It may not be surprising that there would be bugs in cloud systems. They’re complicated, and pretty new. Windows Azure only became generally available in 2010. Then again, there are also plenty of potential pitfalls in storing data and software on-site. Keeping multiple computers updated with new software and security systems isn’t easy, and local servers—not to mention employees’ laptops—are vulnerable to all sorts of disasters. IT support firms can clarify these issues and help businesses choose the best tools—whether local or virtual—for their needs.
In response to the Leap Day problems, Microsoft has promised a number of improvements to its methods. Among other things, Laing wrote, the company will test its offerings better to avoid problems related to time and date values, work to detect errors more quickly and make customers’ dashboard interfaces more consistently available. The company also pledged to improve customer support and communications tools so that, in the event of an incident, those affected will have quicker access to better information about what’s going on.
Meanwhile, Microsoft is giving a 33 percent credit for the affected billing months for all users of the affected services—Azure Compute, Access Control, Service Bus and Caching—even if their service wasn’t interrupted.
Microsoft must be hoping the slip-up won’t hurt Azure, especially since cloud computing is more and more on the minds of businesses that are choosing how to deal with their data most simply and affordably. In its quest to win over those potential customers, the company also recently cut the price of Azure, following Google and Amazon, which have done the same for their cloud offerings.