Resilience starts and ends with the right culture and governance
Head of Engineering Pieter Lindeque suggests that organisations reconsider the way in which they approach and integrate Operational Resilience and Disaster Recovery. He recommends a set of core principles and tells of how at Nationwide we’ve addressed some of the challenges in our journey.Read more
In a world of disruption, operational resilience has never been more important
Regulatory demands mean we must act now.
Following significant IT-related incidents in the financial services sector and calls to action from successive Treasury Select Committee reports, regulators are proposing to bring operational resilience into their respective policy frameworks, building on existing obligations for Business Continuity and IT Disaster Recovery. According to Bank of England’s Nick Strange,
“We regard operational resilience as an outcome, something we should strive for … and to do that we must manage operational risk effectively”
The Financial Conduct Authority and Prudential Regulation Authority consultation papers set an expectation that firms will have completed any investment required to stay within their impact tolerance within three years. To achieve this, core services identification, mapping and testing activity needs to be completed within the next 18 months.
The following key principles have been established for the regulatory response:
- Identify - Identify important business services and set impact tolerances for these
- Map - Map resources to business services
- Test - Test ability to remain within impact tolerances using scenarios
- Invest - Take action to ensure operation within impact tolerances
What is Operational Resilience? And how does Disaster Recovery fit into it?
At Nationwide Building Society we define Operational Resilience as the ability to prevent, respond to, recover and learn from operational disruptions in order to maintain our propositions and services to members.
Disaster Recovery is the ability to recover critical services in a disaster event, such as a hosting location loss or cyberattack, in order to maintain our propositions and services to members. As such it is a key part of Operational Resilience.
The Nationwide response
Our Operational Resilience Strategy was board approved in 2018. This was quickly underpinned by a funded Operation Resilience programme of work that formed part of our broader technology strategy. In 2019, we approved our DR strategy and a three-year roadmap for taking our IT Disaster Recovery capability to the next level. Our IT DR Strategy is founded on the principles below:
- Customer driven – meeting customers’ expectations for service availability
- Business service aligned testing and recovery – regular mandatory testing, prioritising critical services alongside existing testing. Recovery processes aligned to business services
- Board endorsed – business service recovery is treated as an organisational priority
- Aligned to operational resilience strategy – service recovery is consistent with the associated strategies and principles
- Plan for failure – robust, tested capability in place to enable full recovery to alternate hosting location
- Resilience by design – design and build incorporate resilience as a priority
- Business-led governance – service focussed governance model, including DR standards and principles
- Effective execution – optimal resourcing with the necessary skills and capabilities for implementation, automation and future operation
- Clear ownership – business ownership of service recovery
- Iterative enhancements to recovery capability – progressive improvement through rehearsal, testing and remediation
This led to the development of a framework that consists of four elements, shown in the diagram. This framework underpins our entire IT DR Strategy.