
MSP SLA Review: What Your Managed Service Provider Contract Really Says
Your Managed Service Provider Has an SLA. Have You Read It?
I want to ask you a direct question.
If your managed service provider became unreachable during a major incident right now - not slow to respond, but genuinely unreachable - what would you do?
Do you have a secondary contact? An escalation path? A documented process for what happens when the people you're paying to support your infrastructure aren't available when you need them most?
If the honest answer is "I'm not sure" or "I'd have to look into it," you're not alone. But you're also carrying a risk that is greater than most people realise until it becomes relevant.
I've seen it become relevant. It's not a comfortable situation to be in.
What Happened
A company I was close to experienced a significant incident. Not catastrophic by the standards of the major cases you read about in the news, but serious enough to cause real disruption to their operations and real anxiety for their clients.
They had a managed service provider. A relationship of several years, generally reliable and well-regarded. The incident happened on a weekend. The primary contact at the MSP was unavailable. The secondary contact number in the contract routed to a general support line with a 4-hour callback window.
Four hours. During a live incident.
The internal team did what they could. But they were operating at the edge of their knowledge, in a system that the MSP largely managed and that the internal team had limited direct access to. The documentation was held on the MSP's systems. The escalation path beyond the unavailable primary contact was unclear.
By the time a qualified MSP engineer was engaged, the incident had been running for several hours longer than it should have. The cost - in staff time, in client anxiety, in management pressure - was significant. And it was almost entirely a consequence not of the original technical failure, but of the support structure that was supposed to manage it.
What Most MSP SLAs Actually Say
I want to be clear that I'm not suggesting MSPs are unreliable. Many are excellent, and for companies of 15 to 200 people, outsourcing infrastructure management to a qualified MSP is often the right decision. The economics make sense, and the expertise is genuinely valuable.
The issue is not the MSP. The issue is the SLA - what it actually commits to, and whether the company that signed it fully understands what they have and haven't contracted for.
Most MSP SLA documents are long, technical, and written in language designed to protect the provider as much as to serve the client. Response time commitments are often specified in business hours - which means that a P1 raised at 4pm on a Friday might not receive a qualified response until Monday morning, depending on what the contract actually says. Escalation paths within the MSP organisation are rarely specified in client-facing contracts. The definition of a P1 may be narrower in the contract than in the client's understanding of it.
None of this is hidden. It's in the document. But in my experience, most clients sign the SLA, file it, and never read it again until something goes wrong.
The Three Things Your MSP Relationship Needs
Based on what I've seen work and what I've seen fail, there are three things that a properly structured MSP relationship requires - beyond the basic service description.
A defined escalation path with named contacts. Not a support email address. Named individuals, with direct contact numbers, at different levels of seniority within the MSP organisation. This should include a path that bypasses the standard support process in a genuine emergency - and that path should be tested before you need it.
Guaranteed response times that cover your actual risk window. If your business operates on weekends, your SLA response times need to cover those days. If a four-hour callback window during a live incident is unacceptable to your business - and for most businesses it is - that needs to be specified in the contract, not assumed. If your current MSP contract doesn't cover this, that's a conversation to have at renewal. Or before renewal, if the risk is significant enough.
Multiple communication channels with confirmed availability. During a major incident, you do not want to be dependent on a single communication channel - a support portal, a phone number, an email address - that may not be monitored or may be experiencing its own issues. Establishing secondary and tertiary contact methods, and confirming they are actively monitored, is basic incident readiness.
The Internal Side of the Equation
There is another dimension to this that is less about the MSP and more about the internal team.
When a company relies heavily on an MSP for infrastructure management, there is a risk that internal knowledge of the systems atrophies over time. The MSP manages it. The internal team doesn't need to understand it in detail. Over time, the internal team's ability to operate independently - even at a basic level - in the event of MSP unavailability decreases.
This creates a dependency that goes beyond the contractual relationship. If the MSP is unreachable, the internal team's options are limited not just by the lack of support, but by the lack of knowledge. They may not have access credentials. They may not have documentation. They may not have the context to make safe decisions about the infrastructure under pressure.
Managing this risk requires deliberate effort. It means ensuring that internal team members have sufficient understanding of the managed systems to take informed action in an emergency. It means maintaining access to critical systems independently of the MSP. It means having documentation that is held internally, not only by the service provider.
A Practical Step You Can Take This Week
Pull out your current MSP contract. Find the section on incident response and SLA commitments. Read it carefully, and ask yourself three questions.
What is the guaranteed response time for a P1, and does it cover weekends and bank holidays? Who do I call if the primary contact is unavailable, and is that person's direct number documented somewhere my team can find it at 2 am? If our MSP were completely unreachable for four hours right now, what could we actually do independently?
If the answers to those questions make you uncomfortable, you have useful information. The time to act on it is not during the next incident. It's now, when the pressure is off, and the options are open.
Frequently Asked Questions
What should an MSP SLA include for incident response?
A robust MSP SLA should specify: defined P1/P2/P3 severity levels with clear criteria, response and resolution time commitments that cover out-of-hours and weekends, named escalation contacts at multiple seniority levels, communication obligations during active incidents, and penalties or remedies if SLA terms are breached.
How do I check if my MSP SLA covers weekend incidents?
Look for the section in your contract defining "business hours" and "response times." If response time commitments reference business hours without a separate out-of-hours provision, a P1 raised on a Saturday may not be covered by the same SLA timescales as one raised on a Tuesday. If this is a risk for your business, raise it at your next contract review.
What happens if my MSP is unreachable during a major incident?
This depends entirely on what your contract says. Most contracts have a general support number as a fallback, but this often results in longer response times. The best preparation is a named secondary contact at a senior level within the MSP organisation, agreed in advance, with a direct number confirmed as actively monitored.
How often should I review my MSP contract?
Formally at renewal, but practically whenever your risk profile changes significantly - for example, if your business starts operating at weekends, if a critical new system comes under MSP management, or if you experience an incident where the MSP response was slower than expected.
Next: what M&S, Co-op and JLR have in common with your next incident - and what the 2025 UK retail attacks reveal about communication failure at scale.
