6 Critical Lessons for IT Teams to Improve Internet Resilience
The internet is becoming ever more complex and challenging. An outage can happen to anyone at any time. For e-commerce companies, it means a standstill of business. Recent estimates put revenue loss of outages to Amazon.com at $220,318 per minute.
You can’t fully prevent them, but your IT team can take measures to proactively reduce the severity of downtime and limit impact to revenue and reputation. Here are six critical lessons based on my experience as chief product and technology officer at Catchpoint, a company with a mission to help the world’s digital-first companies, including the top four online retailers, achieve internet resilience:
- Ensure that every scheduled change has a rollback procedure in place. The top cause of outages are changes to code or configuration, either manual or automated. This was the case in the Jan. 25 five-plus hour Microsoft outage where a network engineer implemented an unqualified change, leading to “a chain of events which culminated in [a] widespread impact.” What lessons can we draw? First, ensure you have a change management process. Secondly, whenever a change is made, monitor key services, transactions and outputs that may be impacted if things go wrong. Third, learn from Microsoft’s takeaway: Implement “regular, ongoing mandatory operational training and attestation of following all SOPs.”
- Monitor from where it matters. One of the beauties of e-commerce is the opportunity to expand into markets beyond what’s possible with a brick-and-mortar footprint. Location matters equally for monitoring to understand your customers’ digital experience. It’s crucial to have an expansive set of vantage points to gain insight into performance from your customers’ geographical perspective.
- Look beyond code bugs and infrastructure load. Monitoring budget used to be focused on those parts of the system under the control of IT teams: containers, VMs, hardware, code, etc. Today, in our cloud-centric world in which the internet is the new network, we need to monitor customer experience from the end-user perspective. This means Internet Performance Monitoring (IPM) to look across the internet stack to identify any issues which could degrade customer experience.
- Manage, monitor and optimize your entire internet stack. The internet stack encompasses all the systems and subsystems; apps, services and microservices; third-party APIs, datacenters or cloud providers; and CDNs and DNS providers that your business relies on to deliver its services. Achieve internet resilience by monitoring the output and performance of these components in the same way you monitor your own systems. After all, your users — and bottom line — will be impacted just the same.
- Monitor the APIs your site and systems rely on. For e-commerce, APIs are an essential part of making websites functional and interactive, enabling everything from fraud detection to delivering payment functionality. However, APIs are a potential site of failure, hence why deep visibility is a must-have. You need to go beyond whether the API is up or down. Ensure it responds to its inputs and gets the expected outputs back. Also, monitor from where the API is used — i.e., if your application code is calling the API, monitor actively from that datacenter/cloud. If on your webpage, monitor from the same regional locations as your users.
- Be accountable for all four pillars of internet resilience. These are: (1.) Reachability: Can I get to the website or service? (2.) Availability: Is it up or down? (3.) Performance: Is it fast or slow? (4.) Reliability: Is it working? Is it consistent? If all four pillars are working, you can rest assured your customers and employees are being provided with an optimal digital experience — born out of a resilient internet.
Dritan Suljoti is the chief product officer and co-founder at Catchpoint, the Internet Resilience Company™.
Dritan Suljoti is the chief product officer and co-founder at Catchpoint. Drit leads Catchpoint’s engineering and product teams, applying a passion for building innovative technology and more than ten years’ experience leading research and development at Google and DoubleClick to continually improve Catchpoint’s solutions from a user’s point of view. An expert in digital advertising technology and marketplace dynamics, Drit holds three industry patents. He earned an MBA in operations management and marketing from Baruch College.