Ruslan Rakhmetov, Security Vision
Cybersecurity is not only about preventing cyber threats and responding to IS incidents, it is also about ensuring the availability of information infrastructure and, therefore, the entire business. IT infrastructure continuity is largely the task of the IT department, which ensures the operation of hardware and software, monitors the functioning of servers and data centres, availability of Internet services and cloud infrastructure. However, when a failure or disaster occurs, IT and IS colleagues should join forces to restore the infrastructure and the business processes that depend on it as quickly as possible. In this article, we will discuss business continuity planning and management and the recovery of infrastructure operations after incidents, accidents, failures and disasters, and explore options for automating related IT and IS processes.
The high level of business digitalisation brings not only benefits in the form of reduced costs and increased speed of operations, but also an increased dependence on the reliability and resilience of the information infrastructure. From the very beginning of the active digitalisation of business, typical fears were fires or flooding of server rooms and data centres, failures of air conditioning and power supply systems, and interruptions in access to internet services. Then the risks of server equipment seizure and sealing of premises were added, as well as the actions of insiders who could deliberately delete information or disable equipment. Since the mid-2010s, encryption viruses and vipers have also been added, which can either encrypt data and demand a ransom or irrevocably destroy all data. Attackers are becoming more aggressive and sophisticated, destroying data backups or encrypting them as well, and stealthily corrupting data, making it readable but meaningless and unusable. As a result, today the issue of backups, as well as the more general process of business continuity and business process recovery, is becoming a matter of business survival, because if all data is destroyed and backups are missing, it is tantamount to failure for almost any business. In order to prevent such an unfortunate development, it is important to develop business continuity and recovery plans in advance, test and update them, and promptly launch work on them in case of abnormal situations.
Let's start with the main standards related to business continuity and business process recovery:
- GOST R 53647 ‘Business Continuity Management’;
- ISO/IEC 27031:2011 ‘Information technology - Security techniques - Guidelines for information and communication technology readiness for business continuity’;
- GOST R ISO 22301-2021 ‘Reliability in technology. Business continuity management systems. Requirements.’
- ISO 22301:2019 ‘Security and resilience - Business continuity management systems - Requirements’;
- ISO 22313:2020 ‘Security and resilience - Business continuity management systems “ Guidance on the use of ISO 22301”;
- ISO/TS 22317:2021 ‘Security and resilience - Business continuity management systems - Guidelines for business impact analysis’;
- ISO/TS 22331:2018 ‘Security and resilience - Business continuity management systems - Guidelines for business continuity strategy’;
- ISO/TS 22332:2021 ‘Security and resilience - Business continuity management systems - Guidelines for developing business continuity plans and procedures’;
- NIST SP 800-34 Rev. 1 Contingency Planning Guide for Federal Information Systems.
The following definitions and abbreviations are often used in the listed documents and in the literature on business continuity and business recovery:
- Emergency - an extraordinary event or abnormal situation, such as failures, accidents, cyberattacks, natural disasters, catastrophes, terrorist attacks, riots, epidemics, etc. Failures and accidents occur in software or hardware, cyberattacks target software, data and devices in the infrastructure, and natural disasters or catastrophes can affect individual company offices, server rooms and data centres;
- Business Continuity and Business Recovery;
- Business Continuity Planning (BCP) - Business Continuity Planning, i.e. planning and implementing measures to avoid an emergency;
- Business Continuity Management (BCM) - business continuity management;
- Disaster Recovery Planning (DRP) - disaster recovery planning, i.e. planning and implementation of measures to restore business and infrastructure if an emergency does occur;
- Business Impact Analysis / Assessment (BIA) - Business Impact Analysis / Assessment, i.e. analysing the negative impact of an emergency on business processes;
- Maximum Tolerable Downtime (MTD) or Maximum Tolerable Period of Disruption (MTPD) - the maximum acceptable period of business interruption, i.e. the time of business process downtime due to an emergency during which the damage to the company will not exceed an acceptable level. The acceptable level of business impact of downtime is agreed upon and documented in advance for each business process, operation, asset, and upon expiry of the MTD, business processes must return to the level that preceded the emergency;
- Recovery Time Objective (RTO) - Recovery Time Objective (RTO) - the target recovery time, i.e. the time within which data, information systems and infrastructure will be recovered from a backup, from a backup data centre, from the cloud, etc...;
- Recovery Point Objective (RPO) - recovery target point, i.e. a date in the past to which data, systems, infrastructure can be restored. RPO characterises the maximum acceptable level of data loss after an emergency, for example, if the procedure of backing up information from a file server was completed at 3 a.m., and the emergency happened at 11 a.m., the work for the last 8 hours will be lost (if the emergency did not affect the created backups);
- Work Recovery Time (WRT) - recovery time, i.e. time spent on testing and preparation for resumption of business processes at a normal level after recovery of data, systems, infrastructure.
The process of business continuity and business process recovery management can be viewed from the perspective of Deming's PDCA cycle:
1. Planning:
1.1 Develop a business continuity and business recovery strategy and policy, generate and describe a list of emergencies with categorisation by degree of impact and assessment of possible consequences.
1.2 Business Impact Analysis (BIA), including interviews and questionnaires to be completed by business process owners and information system managers who will be able to indicate how an emergency may affect a particular process in the context of the company's operations. The consequences of an emergency can be financial, legal, reputational, operational and other risks. The purpose of BIA will be the formation of target values of MTD, RTO, RPO, WRT metrics for each business process, operation, asset (device, system) in case of occurrence of certain emergencies. When conducting BIA, it should be taken into account that the same information system may have different metrics and even different level of business criticality depending on the time period: for example, in the days of tax and financial reporting, accounting systems will have maximum criticality and their MTD will be literally tens of minutes; in other periods, the same accounting systems will have average criticality and MTD in days.
1.3 Develop business continuity and recovery plans, procedures and regulations for backup, data, systems, infrastructure recovery (including conditions for activation and deactivation of recovery plans, detailed description of recovery steps and timelines).
1.4 Formation of recovery teams (DR Teams), communication and escalation matrices in case of an emergency. It is important to remember that a devastating emergency may result in complete inaccessibility of all information resources, internet and even electricity, so the developed documents should be printed out and then regularly checked to ensure that the paper versions are up to date.
1.5 Assess the adequacy of available measures and tools to ensure business continuity and business recovery. As part of such an assessment, you can analyse the efficiency of the available backup tools: perhaps you should switch to LTO streamers and new generation tapes, stock critical devices and components, purchase DRaaS (Disaster recovery as a service) and BaaS (Backup as a service) services.
2. Implementation:
2.1 Implement the measures and tools adopted to ensure business continuity and business recovery, including technical and organisational measures.
2.2 Conduct training for recovery team members.
3. evaluation:
3.1 Testing business continuity and recovery plans, procedures and regulations for backup, data, systems, infrastructure recovery. Conducting theoretical cyber exercises (DR Teams gathering, joint walkthrough of documents, readiness assessment).
3.2 Conducting practical cyber exercises in case of emergency with reproduction of medium criticality scenarios: unavailability of one or several information systems, certain communication channels, one or several branch offices. Assessment of achievement of the target values of MTD, RTO, RPO, WRT metrics.
3.3 Conduct practical cyber exercises in case of an emergency with reproduction of scenarios of the highest criticality level: complete unavailability of all offices, complete unavailability of data centre/cloud, complete unavailability of current data, systems, infrastructure. Assessment of achievement of target values of MTD, RTO, RPO, WRT metrics.
4. Corrective Action:
4.1 Address deficiencies identified during the assessment phase. Finalisation of documents, fine-tuning of technical means, reasonable adjustment of target values of MTD, RTO, RPO, WRT metrics with participation of business process owners and responsible for information systems, development of compensating measures.
4.2 Continuous improvement of the process of business continuity management and business process recovery, use of automation tools, assessment of the process maturity level and development of measures to improve it. Note that to assess the maturity level of the business continuity management process, the Business Continuity Maturity Model (BCMM) can be used, which describes 5 levels of process maturity. To automate the business continuity process, you can use the Security Vision BCP product, which is designed to collect information about business processes and the resources on which they depend, systematise continuity plans, set and control tasks to bring the infrastructure into compliance with the ability to conduct regular testing of plans with an assessment of the achievement of key performance indicators.