Security Vision
When creating dynamic response plans, criteria should be formulated and considered that will validate the quality of the developed response algorithm for addressing a specific type of information security incident.
The criteria are formulated based on the basic parameters of the incident that must be considered in the incident investigation, response and post-incident activity. It is important to consider the fact that incident work is much broader than the standard analysis and response to correlated IS events. Incident work is:
1. what happens before: preparing the infrastructure (defining, setting up security logs), looking for weaknesses (vulnerabilities and non-compliance with security or compliance requirements);
2. during the incident investigation and response process;
3. post-incident activity in the form of working on bugs, strengthening the infrastructure, updating correlation rules.
Based on the incident management domain defined above, we can identify the following basic quality metrics for response plans.
Coverage by facility
This metric assumes that analytics have been performed on all facilities that were involved or affected in the incident, resulting in a reputation of the facility on a scale of: malicious, suspicious, safe. This metric also assumes that at the end of the analysis, containment or compensatory measures have been implemented for all objects classified as suspicious or malicious to prevent further spread of malicious activity.
In doing so, depending on the maliciousness of the reputation, more stringent or less stringent responses are defined for the site according to a so-called heat map: the more dangerous the reputation, the more stringent the actions included in the incident investigation plan for that site.
Coverage by Investigation Phase
This metric verifies the presence and progression of all phases of work on an incident after dynamic playbook generation. All response plans should include the standard stages:
Enrichment. Enrichment and analytics are performed during this phase to help perform discovery of all involved objects, identification of the attacker object, and generation of a reputation for each object: malicious, suspicious, safe.
Containment. This stage involves emergency actions for the most dangerous objects (reputation = malicious), such as the termination of malicious processes to prevent further spread of the attack with the participation of available means of protection of the organisation's infrastructure.
Investigation. This stage includes additional analytics, collection of digital evidence and artefacts to confirm hypotheses on objects with reputation = suspicious, as well as the formation of additional conclusions (possibly expanding the attack surface or changing its type) that may not have been obvious at the first stages of work on the incident. The tasks of this stage include: determining complete coverage of the incident, checking the reputation of objects on third-party analytics services (e.g., checking that a hash of a file is a safe browser distribution that should not be blocked; or detecting the triggering of a sigma rule, which would indicate the malicious reputation of the triggered process).
Remediation. This stage involves countering the attacker and performing response actions relevant to the object and limited to the identified attack technique. This means that many actions can be performed on each object, but only a limited set of operations will be effective and not redundant within a given attack technique. For example, in the case of external scanning and Active Scanning technique, T1595 (MITRE ATT&CK® framework), the source host should be blacklisted in the firewall; in the case of internal scanning and Network Service Discovery technique, T1046, the source host may be a legitimate device of system administrators who perform inventory within the defined scanning windows, so in this case the response can interrupt the scanning process in the absence of actual inventory tasks in the ITSM system.
Recovery. Involves a set of actions on the incident objects aimed at restoring their functionality, in case it has been compromised during the attack process or during the containment phase, when strict measures such as device isolation, KM lockdown and others are taken for security reasons.
Postincident. This stage is important for performing error work in terms of preventing similar information security incidents in the future. The actions performed in this step are also selected from the set of actions of type =postincident for the set of objects that have been identified as incident coverage. In other words, if a vulnerability is present in the incident objects, a task with increased criticality is created to address the vulnerability when proof of exploitation is present.
Successful completion of actions
This metric tracks the success/failure of executing an action from the collected dynamic response plan directly on the endpoint NWI or device, based on the subject/object of the operation.
Use of an action ranking system
The ranking system involves the use of a statistical modelling technique in the construction of dynamic playbooks: actions are selected from a repository of actions over an object based on a probability density function. In other words, the actions that have been most frequently used by users at a given stage of the incident investigation for a given attack technique are selected. In this way, the system can generate a set of expertise relevant to a particular IS department in a particular organisation. Even if the team changes or the level of competence of the employees decreases, the quality of the investigation will not drop so significantly: the system will remember and offer expertise on the best practices previously used in the company, which the system remembered during the training phase.
In this case, the reinforcement of learning can be:
- No cancellation of actions after execution within a single incident;
- successful closure of an incident failed by the team leader).
Principles of action selection for building dynamic playbooks using statistical modelling:
- ranking by frequency of application of the action in a particular type of incident at a particular stage of the investigation;
- sorting by time will help to address the challenge of adopting new practices: new approaches to investigation and response are used more frequently;
- analysing the frequency of execution: some actions are always applied when analyses are performed;
- chain analyses: some actions always follow each other and it makes sense to propose them as a combined chain;
- the more infrequent the attack technique of an incident, the more actions should be included in the dynamic playbook due to insufficient information on the case;
- analysing unique actions for specific attack types and specific targets.
All of these metrics allow us to achieve the necessary level of quality in the automatically generated content of the system and for the system to validate itself to function effectively without redundant procedures and actions.