MITRE publication ‘11 World-Class SOC Centre Strategies’. Strategy #8 ‘Use automation tools to support the work of SOC analysts’

26.06.2023

| Listen on Google Podcasts | Listen on Mave | Listen on Yandex Music |

Ruslan Rakhmetov, Security Vision

When a SOC works with different technologies, it is important to build an architecture that supports the work of SOC team members and integrates data from different sources (IT/IS systems, monitoring systems, cyber threat analytics) to transform data into information and information into knowledge. The technologies used in SOC, such as SIEM, TIP, EDR, etc., provide their own interface and analysts have to constantly switch between consoles of different solutions. The authors of the publication point out that the best strategy would be to reduce the number of management consoles and mutual integration between different SOC technologies, as well as to automate and centrally manage the execution of repetitive tasks, escalation procedures and incident handling. Strategy #8 focuses on describing the SOC technologies that will help achieve these goals.

1. SIEM systems.

Solutions in the Security Information and Event Management (SIEM, Security Information and Event Management) class of systems allow millions of IS events to be processed and the SOC centre to benefit from them. As with any technology in the SOC, it is important not only to purchase an expensive solution, but also to invest in its configuration, administration, and support. SIEM solutions allow collecting, aggregating, filtering, storing, categorising, correlating, displaying data relevant to IS tasks, as well as performing analyses in real time and on the basis of accumulated data. SIEM systems allow to identify significant IS events from the data flow from various sources, which enables several SOC analysts to quickly process a large amount of information related to cyber security at once, as a result identifying complex cyber attacks by APT groups, analysing earlier incidents, processing cyber incidents at all stages of the lifecycle, using cyber threat analytics data, proactively searching for cyber threats, providing situational awareness of the customer company, and ensuring the company's compliance with cyber security requirements.

The use of a SIEM system allows to detect (data collection, normalisation, correlation), verify (enrich, eliminate false positives, pass on for further analysis) and respond to an IS incident. The use of SIEM systems in SOCs and in the IS departments of various companies may be accompanied by the misconception that this technology will reduce personnel costs, however, the use of SIEM often makes it possible to identify incidents that were previously invisible and went unnoticed, which can lead to an increased workload on IS analysts, and, consequently, to an increase in staff. Similarly, the application of SOC automation solutions, such as SIEM or SOAR systems, may not lead to a complete abandonment of L1 analysts, but rather to increased efficiency and effectiveness as a result of handing over incidents that have already been cleaned, prioritised and enriched by automation tools.

The main functions and components in modern SIEM systems are:

1.1 Data Collection:

Data can be collected either by a collector component (agent), which is either installed locally on the target system from which it is required to receive IS audit events, or connects remotely to the target system to receive data, or accepts data sent to it by the target system. The collector allows filtering, deduplication, caching, data flow control, and prioritisation of data processing in an operationally efficient, lossless manner, using connection authentication and traffic encryption.

1.2 Data normalisation and storage:

In many SIEM solutions, data is collected in a central repository that enables data search queries with quick results. SIEM nodes that provide data correlation and storage were previously traditionally offered as PACs or software installations, but more recently virtualised solutions, cloud-based IaaS images, and SaaS solutions are becoming more common. Normalisation (i.e. parsing of incoming data, selection of significant properties of events, structuring of processed information) can be performed when reading stored events (e.g. when performing search queries in SIEM, the method is called ‘schema on read’) or when writing stored events to SIEM (more traditional variant, the method is called ‘schema on write’). Commercial SIEM solutions for normalisation and data storage differ in the number of parsers and out-of-the-box integrations, as well as in the licensing models of the incoming event stream and support for different storage methods (use of storage for ‘hot’ and ‘cold’ data). In large infrastructures, the storage and processing subsystem in a SIEM may be a geographically distributed clustered system with multiple nodes. In distributed environments and large companies, there may be a question of the need to perform search queries, correlation, data analytics between all nodes of a single large-scale SIEM system or even between different SIEM and Log Management systems.

1.3 Data Analytics:

Data analytics and incident detection can be performed using the following main approaches:

- Real-time correlation and incident detection;

- Correlation and detection of incidents in previously stored data;

- Machine learning techniques for incident detection.

The correlation and analytics core of the SIEM system performs categorisation of incoming data, prioritisation of incoming events depending on configured correlation rules and enriched data (e.g. by comparing with vulnerability scan reports or using various machine learning techniques), and execution of a limited set of response actions (e.g. sending an email alert to SOC operators, creating a case in the SIEM system, running a script). Advanced response actions are performed already in SOAR systems.

1.4 Search queries, interaction, work of analysts:

SOC-centre employees perform search queries in the SIEM-system console, monitor the appearance of alerts from SIEM, and assess the state of cyber security using visualisation tools (on dashboards, charts, graphs). SIEM solutions used in SOC centres should provide the ability to automate the execution of frequent search queries and visualisation of the customer company's IS state, which leads to reproducibility of results and simplifies the exchange of information with colleagues. As a rule, several users work simultaneously in a SIEM system, and it should serve all their search queries, as well as provide the ability to create notifications/tags, create cases/tickets, send notifications, and escalate incidents.

1.5 Flexible Integration:

Provision should be made in advance to perform import/export operations for SIEM content (correlation rules and settings) as well as directly for IS events, which may be required, for example, when storing critical incident data or when requested by law enforcement. Also SIEM solutions for large infrastructures should support fault tolerance, clustering, forwarding of events to other SIEMs (e.g. parent organisation).

The authors of the publication point out that, since a SIEM system is probably the most expensive one-time purchase for a SOC centre, you should first assess the feasibility of such a purchase by evaluating the SOC centre's need and answering the questions:

- Do the current needs of the SOC centre exceed the current capabilities of the technology being used?

- Does the SOC analyse a significant proportion of information in near real-time?

- Does the SOC need to process information from various devices on the network in near real-time?

- Is the SOC willing to devote resources to administering and customising the SIEM system?

- What will be the usage scenarios for the SIEM system?

- What benefits to the SOC centre functions will the use of a SIEM system bring?

- Does the functionality and architecture of the SIEM system match the SOC's medium- and long-term development plans?

Before making a decision to purchase a SIEM-system, it is important not only to make a qualitative and detailed functional comparison of competitive solutions, but also to evaluate the licensing model of the SIEM-solution manufacturer: licensing may be performed by the number of installation nodes, by the volume of processed data (in the volume of incoming traffic or the number of processed events per unit of time), by the number of users, by the number/type of data sources, by additional functionality. It is important to take into account licence restrictions, which can be ‘bumped’ in case of a sudden surge of events (e.g. a serious incident) or as a result of incorrect initial calculation of the planned load on the SIEM-system. You should also consider the cost of SIEM system support and maintenance, which can be as much as 20-30% of the initial SIEM cost annually. For cloud-based SIEMs offered on a SaaS model, limitations and initial performance calculations may not be important, as the SaaS model itself implies flexible on-demand scalability. However, the possibility of uncontrollable increases in cloud SIEM costs caused by spikes in resource consumption, e.g. due to the connection of redundant ‘noisy’ sources or lack of SIEM fine-tuning, should be kept in mind.

The authors of the publication stress the importance of fine-tuning the SIEM system, developing use cases, and creating infrastructure-specific customised content and analytics, without which investment in an expensive SIEM system may be ineffective or even pointless. In addition, the publication provides the following recommendations for using and configuring SIEM systems in SOC centres:

- Maintain the ‘health’ of data sources and the high quality of the information they transmit: it is recommended to regularly monitor the list of sources and identify those that have stopped transmitting events, as well as to identify the reasons for changes in event content;

- Manage SIEM content creation: customised, fine-tuned correlation rules, analytics, search queries, reports and dashboards provide the maximum benefit from SIEM systems, so it is important not only to create such content, but also to manage it centrally, for this purpose several SOC employees can be assigned the role of ‘SIEM content manager’;

- Optimise search queries: inappropriate search queries can lead to excessive load on the SIEM, so you should alert and educate SIEM users about such queries;

- Maintain a knowledge base: a knowledge base of the protected infrastructure (users, devices, services) should be managed to improve the quality of the SOC's work with the SIEM system;

- Manage alerts and cases created in the SIEM: a person should be assigned to control the quality of closure of cases (incidents) created in the SIEM, and SOAR and Case Management class systems can be used for deeper incident management.

2. Log Management systems.

Log Management (abbreviated as LM) solutions can be used as a simpler and cheaper alternative to SIEM systems: such products also allow for aggregation and storage of events, search and reporting. SIEM solutions were created to solve IS problems and for use in SOC-centres, so out-of-the-box they support a variety of incident detection methods and relevant analytics, while LM-systems are used in more general IT scenarios. The difference in functionality between LM and SIEM solutions is gradually blurring, so it is important to analyse the implementation of specific functions that are planned to be used in the SOC-centre; at the same time, the authors of the publication point out that it is possible to build a SOC based on LM-systems if other IS-technologies are used. Small SOC-centres, according to the authors of the publication, can do quite well with a combination of LM and EDR solutions, and full-fledged SIEM-systems will be too expensive to purchase and maintain, so the economic feasibility of their application in small SOCs remains in question, however, the growing popularity of ‘SIEM as a service’ offers may affect the current state of affairs.

3. User and Entity Behaviour Analytics systems.

User and Entity Behavior Analytics (UEBA for short) systems identify deviations in the behaviour of users, devices and other entities in the infrastructure from normal, patterned activity that may indicate malicious activity. The focus of UEBA systems is to identify insider threats and the actions of intruders who have already infiltrated the infrastructure. UEBA solutions themselves can be stand-alone solutions and can complement the functionality of SIEM systems or other anti-virus systems. The operation of UEBA solutions is based on the following functionality:

- Usage scenarios: detection of non-standard user behaviour, insiders and external intruders operating from compromised accounts or horizontally moving across the network, detection of leaks and ‘exfiltration’ (withdrawal, theft) of valuable information, prioritisation and enrichment of incident data based on scoring risk scores of user and entity behaviour, identification of similar patterns of behaviour and relationships of users and entities;

- Data sources: authentication and access control systems, configuration management systems, personnel data on employees, firewalls, IDS/IPS systems, DPI systems (Deep Packet Inspection), data from endpoint protection systems (EDR solutions, antivirus);

- Analytics: use of machine learning methods (with and without a teacher), rule-based detection.

The use of UEBA systems will give the best results when applied in infrastructures with a large number of users with well-defined behaviour patterns - in this case UEBA will be able to detect deviations more accurately. Before deciding whether to purchase a UEBA solution, the SOC centre should assess the readiness to use the technology, identify data sources for UEBA, assess the ability of UEBA to integrate with existing antimalware systems, check with the vendor about the UEBA model's training timeframe and false positive and true positive rates, and understand the extent to which the conclusions provided by the UEBA system will be understood by the SOC analysts.

4. Case Management Systems.

Case management systems (Case Management) help the SOC staff to keep control and records of the cyber incidents handled. In more mature and larger SOCs, this need will be particularly high, and incident management processes will be more complex and sophisticated. Case Management systems should enable SOC centre team members to do the following:

- Maintain a complete record of incident information at all stages of the incident lifecycle (triage, analysis, response, closure, reporting);

- Allow structured (incident type, category, time of discovery) and unstructured (manual entry by employee) information to be recorded in a timestamped incident card;

- Provide an interface for interaction between SOC employees and representatives of the customer company, provide integration with other similar accounting systems in the company, provide functionality for sending email messages for notification, escalation, task assignment;

- Support preservation of incident artefacts (files, VPO samples, events);

- Provide linking of cases to each other, as well as creation of child cases and transfer of cases between employees;

- Provide the ability for employees of the client company to create cases, e.g. via web form or email;

- Maintain a system of metrics, trend scores, feedback for tweaking detection and analytics;

- Support role-based access differentiation model, protection against unauthorised access by intruders.

5. Security Automation, Orchestration, and Response systems

Cyber Incident Orchestration, Automation, and Response systems enable analysts to quickly and efficiently create and execute repeatable processes typical of SOC centres. Although similar functions may be found in SIEM and Case Management systems, the targeted use of SOAR solutions specifically will allow the SOC centre to:

- Collect incidents from heterogeneous, disparate systems, presenting operators with a single incident interface;

- Enrich and prioritise alerts and incidents, augment them with cyber threat analytics, and enrich them with knowledge of the entities affected by the incident;

- Run automated actions to obtain additional information about the incident, e.g., running suspicious files in a sandbox, obtaining device vulnerability scans, requesting user information from HR systems;

- Perform search queries on stored events;

- Interact with the customer company's employees, e.g., requesting confirmation of actions performed or obtaining explanations;

- Perform automated active response actions, such as severing network connections or blocking accounts.

There are many different reasons to apply SOAR systems in SOC centres. The authors of the publication cite some of them:

- Too many events / alerts and lack of time to manually analyse them;

- The need for increased repeatability and integrity in performing triage and investigations;

- The ability to quickly train and involve less experienced staff in incident handling based on established practices and response procedures in the form of SOAR response scenarios written by more experienced experts;

- Lack of resources and expertise to manually implement the integration and automation techniques that SOAR offers out-of-the-box;

- Increased job satisfaction of analysts due to reduction in routine manual operations;

- Reduced incident triage time (mean/median time to detect and analyse);

- Reduced response time (mean/median containment, response, remediation time);

- Synergy of efforts of all SOC functions in analysing and detecting incidents.

In general, the application of SOAR solutions in a SOC centre will be most appropriate and effective when a certain level of maturity is reached:

- An incident handling process has been developed and documented that the SOC centre analysts are working to follow;

- The automation and scripting capabilities used in the SOC do not meet all needs and are difficult to keep up to date;

- There is a growing need for an orderly transfer of knowledge and practices from senior staff to junior staff, who also do not always follow typical incident handling guidelines;

- There is a growing need to co-develop and share ways to automate SOC workflows;

- Analysts have sufficiently achieved repeatable processes for responding to typical incidents and are eager to switch to more creative tasks;

- The tools that the SOC plans to integrate with the selected SOAR solution have documented or compatible API mechanisms.

To achieve success when implementing SOAR platforms, the authors of the publication make the following recommendations:

- Invest in IS tools (SIEM/LM, TIP, network and host sensors) with well-documented API mechanisms;

- Use project management best practices, including appointing SOAR champions from within the SOC centre, defining success criteria for a SOAR implementation project, allocating resources and time to develop and test SOAR use cases;

- Begin by identifying easily implementable but useful and frequently repeated activities to automate in SOAR (e.g., information gathering and incident enrichment in SOAR);

- For each system to be integrated with SOAR and business unit involved, identify a stakeholder worth engaging with in advance, and define in advance the behaviour of the target system to be integrated with SOAR, including when automated actions fail;

- Until low-risk integrations are complete and the SOC's incident handling processes are more mature, avoid high-risk workflows that interact with systems outside the SOC's control, make irreversible changes, disrupt service availability, accounts or network traffic, make changes on a large number of systems, disrupt users, customers or revenue, or impact systems that affect human life and safety;

- When implementing high-risk workflows, you should avoid performing lockdowns on firewalls, VPN gateways and account management systems (at least for the first 3-6 months of SOAR solution operation), and it is also recommended to perform controlled testing of integrations first on a limited set of non-critical devices after hours. In addition, it is important to make sure that changes automatically made in response can be rolled back quickly.

Although some SIEMs also have capabilities to automate response, specialised SOAR solutions have the following distinguishing features:

- Allow you to collect cases, alerts, and warnings from a variety of sources, including incoming emails, incidents from SIEMs, and EDR solution triggers;

- When responding, offer to view similar incidents/cases, show analytics on possible further actions depending on previously closed cases, use machine learning techniques to link incidents to each other;

- Provide a user-friendly graphical editor for creating response scenarios and setting up automated actions, which reduces labour costs and simplifies work, while manual editing is available for automation actions if needed;

- Provide a single framework for both developing and using response scripts;

- Provide a pre-defined set of APIs and ways to integrate with the company's existing IT/IS systems;

- Provide API integration with new or customised SOC tools;

- Unlike SIEM systems, SOAR platforms are not designed to handle the high flow of IS events and log management.

6. Protection of data and SOC tools.

The authors of the publication point out that attackers' knowledge of the list of specific SOC monitoring technologies in use will allow them to either launch a targeted attack against these technologies or customise their hacking tools to avoid detection. Thus, a SOC centre must be isolated from the company's overall LAN on the one hand, and on the other hand it must receive information from there, interact with IT/IS systems, and provide reporting and situational awareness to company employees. The very operation of a SOC centre implies that sooner or later there will be a cyber-attack in the customer company, which may affect the SOC itself, so the SOC centre should be guided by the principle of ‘assume breach’ (i.e. ‘assume that you will be hacked or have already been hacked’). One of the recommendations of the authors of the publication is to create a separate Windows domain to isolate the SOC infrastructure from the general infrastructure of the customer company, while IS events should be collected from systems connected to the general infrastructure of the company. When planning a secure SOC architecture, you should consider the state of the customer company's cyber threat landscape, the geographic distribution of its infrastructure, and the logical network segmentation methods used. It is also worth determining what the in-built security features of the SOC tools are and whether the SOC resources allow only dedicated proprietary hardware and network infrastructure to be used. The authors of the publication stress that the protection of the SOC infrastructure itself should be given the highest priority, and provide some tips on how to ensure the internal information security of the SOC centre.

information security MITRE SOC