- Raspberry Robin: Highly Evasive Worm Spreads over External Disks
- Cisco Introduces Responsible AI – Enhancing Technology, Transparency and Customer Trust
- Managing Customer Trust in Uncertain Supply Chain Conditions
- Hope on the Horizon
- Toys of Tomorrow… What will spark your imagination? Fuel your imagination?
- Protecting Purses and Digital Wallets
- The Password that Felled the Kingdom + MFA vs 2FA
- The MOE’s RA 3.0 and Zscaler
- 7 Critical Reasons for MS Office 365 Backup
- Penetration Testing Important, but…
- Social Engineering and Poor Patching Responsible for Over 90% of Cybersecurity Problems
- Breach Incidence and Costs On the Rise Again + 5 Ways to Reduce Your Risk
- Cybersecurity Insurance Policies Require Security Audits and Pen Testing
- Wireless strategies for business continuity gain importance as enterprise expand IoT, cloud, and other technologies
- How Cybercrooks are Targeting YOU
- Enabling Digital Transformation with Cisco SD-WAN
- WFH Post Pandemic – What It Will Look Like. What You’ll Need.
- Leaders to looking to the IoT to improve efficiency and resiliency
- Cyber Security Vernacular – Well, some of it, for now
- Why You Need Disaster Recovery, NOT Just Back-Ups
- 10 Reasons Why Having an Expert Manage Your Cybersecurity Makes Sense and Saves Dollars
- Converting CapEx IT Investments into Manageable OpEx
- The Hybrid Workplace – Planning the Next Phase
- Cisco Cloud Calling: Empowering Customers to Thrive with Hybrid Work
- When You Can’t Access the Cloud
- How to Keep On Keeping On
- New Cisco Research Reveals Collaboration, Cloud and Security are IT’s Top Challenges
- Threats from Within on the Rise
- Cloud Covered? If Not, Take Cover!
- Zero Trust and Forrester Wave Report
- Password Based Cyber Attack: Like Leaving Keys Under Doormats
- So, What’s Up With Sensors?
- Sensors and Systems Create a Digital “Last Mile” and Help Skyrocketing Costs
- Scanners Provide Peace of Mind for Returning Students and Workers
- Sensors Improve Operations and Bottom Line… Easily and Cost-Affordably.
- Cisco Meraki Looks at 2021
- 2020 Holiday Shopping: Cybersecurity and Other Tips to Safeguard Wallets and Systems
- How to make the most of the technology you have
- Personnel, Planet and Business Progress: More Interdependent Than Ever Before
- Sure… you can get them all in the boat – but can you get them to work well together?
- Pushing the Zero Trust Envelope – Cisco is Named a Leader in the 2020 Forrester Zero Trust Wave
- Cloud Data Must be Protected, Too!
- Don’t Let Anyone Get the Dirt on You – Make It Instead!
- How IoT Devices Can Help You and Your business
- WebEx – A World of Possibility
- Creating Your Breach Response Plan Now Will Save You Thousands Down The Road
- Been hacked? Here’s what you must do next.
- The Need for Pen Testing is At an All-Time High
- 5 Ways an IT Reseller Improves Your Performance and Peace-of-Mind
- 5G and Wi-Fi 6: Faster, more flexible, and future ready. Are you?
- Network and Data Security for Returning and Remote Workers + Disaster Recovery Symposium
- Collaboration and Cisco WebEx: Protecting Your Data
- Thursday’s Virtual Conference Tackles Today’s Supply Chain Trials and Tribulations
- 10 Tips to Reduce Cloud Storage Risk
- COVID-19 Crisis Fuelling IT Spending
- Supply Chain/Logistics Experts Share Their Expertise
- Cisco Breach Defence Overview
- Announcing Our New Website and Blog
We all welcome a bright sunny day when there’s not a cloud in the sky. We love those days, but when the equivalent happens with cyber clouds, it’s a different story.
Users of Microsoft’s Windows 10 operating system learned this in mid-March. Two weeks later, on April 1st, 2021, Microsoft Corp. experienced a massive cloud outage that took most of its Internet services off-line. During the outage, no one could access Microsoft’s Azure cloud services including Bing, Office 365, OneDrive, Skype Live, Teams and Xbox. This was no April Fool’s joke, and the impact was felt around the globe.
Last year, the almost overnight shift to “work from home” resulted in a surge in remote traffic, and stressed all the mega cloud computing platforms.
Not surprisingly, the major cloud providers all experienced major cloud outages last year, which had a cascading effect on web applications and services, impacting businesses in myriad ways. We’ll get that to in a moment, but first…
What is a Cloud Outage?
It’s simply the term given to the period of time when the cloud infrastructure (including computing and networking capabilities, data storage and the interface for users to access virtualized resources) cannot be accessed or is not available for use. Some instances, it can also refer to lack of performance as per the agreed-upon SLA metrics, creating downtime for the user.
In last week’s blog post, we discussed how to protect yourself against power surges and grid failures, coincidentally, power failures are the biggest cause of cloud outages. Indeed, some years back, a Gartner report suggested that power outages represented a larger threat to cloud usage than potential security breaches.
Leading Causes of Cloud Outages
- Power Failure: This one needs no further explanation, though you may wish to ask your cloud providers what failsafes they have in place in terms of UPS, generators and geographic redundancies.
- Cybersecurity Breaches: Despite every organisation’s best efforts, bad actors can sometimes worm their way into systems. Attacks such as Distributed Denial of Service (DDoS) can prevent users from accessing cloud services. Other types of malware and ransomware can cripple the cloud altogether.
- Hardware and Software Problems: Like any other network, cloud infrastructure comprises multiple hardware and software technologies. As such, cloud services are prone to the same issues that can cause problems with your network. The difference is the number of redundant systems, checks and balances, and personnel dedicated to keeping the cloud up and running.
- Networking and Collaboration Challenges: Cloud providers rely on telecommunications providers to deliver their services. They also have to contend with government policies in different parts of the world. When communication falls down, there can be problems. It’s good to know, however, that there has been far more collaboration in the past two years ,so that load-balancing can be resolved between multiple players in countries, and other issues can be addressed as well.
- Human Error: Despite the strong checks and balances mentioned above, it could, potentially, take one person to make one mistake and… Luckily, protocols are becoming tighter and tighter to ensure this does not happen. Indeed, none of the recent cloud outages were the result of someone’s inattentiveness.
Major Cloud Outages of 2020
- March 3, 2020 – Microsoft Azure: In this instance, a mechanical cooling system failure led to the outage that affected customers served by Microsoft’s East US Data centre.
- June 10, 2020 – IBM Cloud: As a result of a third party network provider overloading the IBM cloud network with incorrect routing, customers in Washington, DC, Dallas, London, Frankfurt and Sydney could not access their regular cloud services, including Kubernetes, App connect and Watson Ai for nearly 4 hours.
- November 25, 2020 – AWS: Customers served by the North Virginia location were impacted by a global outage that began at 8:15 ET and lasting 11 hours. Users, including 1Password, Adobe Spark, Flickr, Glassdoor and The Washington Post, lost a full business day and more; services such as Lambda, Managed Blockchain, Marketplace, MediaLive, Workspaces several others were also affected. The cause: Problems related to Amazon Kinesis, which enables real-time processing of streaming data
- December 14, 2020 – Google Cloud: For nearly an hour, services such as YouTube, Google workspace and Gmail experienced interruptions as a result of an outage related problems with the automated storage quota management system. The system’s authentication capacity was reduced and users around the world could not access services.
Each of these four leading companies experienced what users today consider significant downtime, each for a different reason. Each of these companies has also experienced cloud downtime in 2021.
The point? Given that cloud usage is expected to grow 50% from 2020 – 2024, there are actually two:
- IT professionals must include ways to protect their organisations in the case of cloud outages; this should become part of your Standard Operating Procedure Playbook.
- Corporations should consider private cloud permission-critical applications.
Cloud Choice Considerations
- It’s important to know what your SLA requirements are for various types of workloads. For mission-critical IT workloads, public cloud may not be the right option.
- You should also consider any regulatory compliance standards by which your industry is governed before making final decisions.
- Explore a multi-cloud approach as part of your IT infrastructure strategy. That way, if one of your cloud providers’ data centres goes down, you have failover redundancies to ensure business continuity. Also, if you go with multiple vendors, you won’t get locked in and can leverage reduce pricing and market initiatives to your advantage.
Some Tips if Using Public Cloud
The use of “tips” is deliberate; we are still in the early days of how to manage cloud failure and no definitive protocols have yet to be established.
- Identify your SLA requirements for your different workloads and know how your public cloud provider stacks up. A hybrid version may better suit your needs, if the budget is not there for other options.
- Look at how you are using the cloud today. Assess which databases are essential for your operation and consider having redundant on-premises storage for these databases, with an alternate way for key personnel only to access them.
- Have an alternate way for employees to communicate. Zoom and Skype are two options that allow people to have free accounts. Zoom also went down for four hours on August 24, 2020, affecting millions around the world.
That being said, get employees to install these ahead of time. Concurrently, you can create a private landing page does not require login credentials; use this to communicate updates to employees. Obviously, you will need to train your employees ahead of time on the procedures to follow, and will need to test their ability to communicate using another platform.
- Ensure you have an alternate means of communicating not only with your employees, but with customers and other stakeholders so you can let them know the source of your business interruption, and what you are doing to serve them in the interim.
- Microsoft has had multiple outage problems that have affected users’ ability to authenticate and login, but they are already in, service continues to work. So… get your employees to log in to Teams and Microsoft 365 when they start their workday, and stay logged in until quitting time.
Although public cloud can be an excellent option in many instances, for large enterprise-level organisations, there may be better options.
As our name suggests, we are cloud experts. If you’d like more information on how to better protect your cloud, and how to ensure you minimize your risk while leveraging cloud benefits, please feel free to contact us at [email protected] or (416) 429-0796 or 1.877.238.9944 (Toll Free), even if you’re only looking for a knowledgeable shoulder on which to bounce some ideas.