AI vs. AI (Cyber Threat Attack and Defense)

26.03.2026

Yuri Podgorbunsky, Security Vision

Introduction

In the new era of cybersecurity, it's becoming increasingly difficult to cope with the rapid growth and speed of attacks on organizational infrastructure, including those involving chatbots and artificial intelligence (AI)-powered agents. If today's topic is about AI-based attacks and defenses, it can be considered from the following perspectives:

· AI-based attacks on infrastructure, including AI.

· Protection against attacks on AI in the organization.

· Response to attacks and incidents.

But to begin with a more complex topic, it would be good to understand what AI is and how it is managed in practice.

What is AI?

AI is a technology that allows computers and machines to (partially) imitate human cognitive functions such as learning, reasoning, natural language understanding, and decision making, i.e., perform tasks that are typically performed by humans.

Development of AI

For many years, the focus has been on AI models that excel at passive, discrete tasks such as:

· Answers to questions

· Translation of the text

· Image and voice generation .

This paradigm requires constant human control at every stage. A paradigm shift is currently underway, moving from AI that simply predicts or creates content to a new development capable of autonomously solving problems and performing tasks – an AI-powered agent.

This new development is built around AI agents. An agent is not simply an AI model in a static workflow. It is a fully-fledged application that can formulate plans and take actions to achieve goals. It combines the reasoning ability of a language model (LLM) with the practical ability to act, enabling it to handle complex, multi-step tasks that a language model alone cannot. A crucial capability is that agents can work autonomously, determining the next steps necessary to achieve a given goal, without constant human supervision or guidance.

So, agent

An agent is an active autonomous system capable of using various tools to achieve established results, some of the key features being:

· Goal setting – the agent not only solves a specific problem, but also strives to achieve the established goal.

· Interaction with the environment – using tools such as API, MCP, receiving feedback with adaptation (through interfaces and integrations).

· Autonomy – allows the agent to perform certain actions independently.

· Planning – the agent is able to decompose a goal set by a person and sequentially perform actions.

Elements included in the agent:

· Application is an interface for interaction with the user.

· Interfaces and integrations – interfaces that allow the agent to use external tools (NG SOAR, Vulnerability Scanner, etc.).

· A language model (LLM) is a type of AI, including a machine learning model, used for natural language processing tasks (or simply, an agent's brain).

· Memory – short-term and long-term data storage.

· Knowledge base (RAG) – data on which the model is further trained (for example, cyber threats from Threat Intelligence (TIP), a knowledge base containing descriptions of the main known tactics and methods of cyberattacks – MITRE ATT&CK, vulnerability databases).

An agent is like two sides of the same coin: it is used both to protect an organization's infrastructure (including analysis – implementing attacks without causing damage) and in attacks by malicious actors.

Agent as a tool for information security:

· Finding vulnerabilities (before attackers find them and exploit them)

· Attack vector forecast

· Modeling information security threats

· Anomaly detection

· Cyber incident response

· Strengthening the capabilities of the cybersecurity team.

Agent on the part of the attackers:

· Conducting phishing campaigns

· Social engineering

· Falsification of a person's appearance and voice

· Finding and exploiting vulnerabilities

· Implementation of threats (e.g., data poisoning in AI)

· Bypassing information security measures

· Generation of malicious code.

And also as a security analysis tool (conducting attacks on infrastructure, for example, AI Red Teaming) or agents that are used for protection and other purposes, with the aim of identifying cybersecurity vulnerabilities, threats and risks before attackers can exploit them and disrupt business processes, systems and, in general, the organization's infrastructure.

Infrastructure security analysis

There are security analysis tools on the market, and this field is developing rapidly. As noted above, such tools are used, at a minimum, to identify infrastructure vulnerabilities for subsequent defense (protecting systems, networks, and the sensitive information they process). Unfortunately, these same tools are also used for malicious purposes – to launch attacks on organizations and their infrastructure.

Consider the open source AI platform HexStrike, which is created with the best intentions for offensive security (security testing) and is designed to enhance security.

HexStrike Composition

HexStrike consists of the following elements:

· The platform or framework itself

· Specialized agents

· Security tools

· Client and server (Model Context Protocol).

HexStrike is a server platform that includes more than a dozen agents that solve highly specialized tasks, such as:

· Intelligent Decision Engine – determines the appropriate tools for a specific task.

· CVE - Intelligence Agent – manages vulnerabilities.

· Exploit Generator – generates code to exploit discovered vulnerabilities.

· Agents for network analysis, web application testing, binary analysis, and other tasks.

HexStrike integrates with over a hundred security tools that cybersecurity professionals use every day through Model Context Protocol – an application-level protocol for interaction between language models (LLM) and external data sources and tools (MCP).

MSR server

HexStrike MSR server is the core of the platform, which:

· Accepts requests from agents using the MSR protocol

· Coordinates the selection, execution, and results of tools

· Manages the execution status of agents' tasks and the results.

The actions of the central component of the MSR server are as follows:

· Target analysis (target host, web application, etc.)

· Determines the optimal set of tools

· Configures startup parameters (e.g. port range, scan depth)

· Forms a testing strategy (e.g. starting with reconnaissance, then scanning, then exploitation).

How does this work?

The user forms a language query, for example, “Find vulnerabilities and exploit them, specifying the network name of the target host or IP address.”

The MSR server and decision module transform the user's language query into structured calls to security tools. All actions of the HexStrike AI platform are then performed automatically: agents execute assigned tasks, analyze the results, adjust strategies if necessary, and repeat attempts to achieve the desired goal.

HexStrike's results can be generated in the form of a structured report.

AI Attack Chain

The AI attack chain consists of five stages: reconnaissance, poisoning, capture, storage, and impact, with a target-shifting branch. Below, we'll examine each stage of the attack chain in more detail.

What happens during the reconnaissance phase of the AI attack chain?

During the reconnaissance phase, the attacker maps the system to plan an attack. Key questions they ask themselves at this stage include:

· What are the ways in which controlled data can enter an AI model?

· What tools, MCP servers, or other features that could be exploited by attackers does the app use?

· What open source libraries does the application use?

· Where are system restrictions applied and how do they work?

· What types of system memory does the application use?

Reconnaissance is often interactive. Attackers explore the system to identify errors and user behavior. The more they can learn about the system's behavior, the more accurate their subsequent actions will be.

Defense priorities for disrupting reconnaissance:

· Access Control: Restrict system access to authorized users.

· Minimize information: Remove error messages, system hints, expandable information, and component identifiers from output.

· Probing behavior monitoring: Implement telemetry to detect unusual inputs or access patterns that indicate reconnaissance.

· Model hardening: Fine-tuning models to counter data-passing and privacy-gathering attacks.

Disrupting reconnaissance early prevents attackers from gaining the knowledge needed to conduct precise attacks later in the attack chain.

How do attackers poison AI systems at this stage?

During the poisoning phase, the attacker's goal is to place malicious inputs in locations where they will ultimately be processed by the AI model. Two main methods predominate:

· Direct hint injection: The attacker is the user and inputs data through normal user interaction. The impact is typically limited to the attacker's session, but can be useful for behavioral testing.

· Indirect hint injection: The attacker poisons data that the application receives on behalf of other users (e.g., RAG databases, shared documents). This is where the scale of impact increases.

The most common method of infection is through text messages. However, there are others, such as:

· Training data poisoning: Injecting corrupted data into datasets used to fine-tune or train models.

· Adversarial example attacks: Manipulating input data at the bit level (images, audio, etc.) to force misclassification.

· Visual payload: malicious symbols, stickers, or hidden data that influence the model's results in a physical context (e.g., autonomous vehicles).

Priorities for protection against poisoning:

· Clean all data: Don't assume internal pipelines are safe, and apply security barriers to user input, RAG sources, plugin data, and API feeds .

· Rephrase input: Review content before accepting it to thwart malicious software.

· Monitor data ingestion: Clean all publicly accessible data sources before ingestion.

· Data Ingestion Monitoring: Track unexpected data spikes, anomalous injections, or high-frequency contributions to data ingestion pipelines.

How can attackers hijack the behavior of an AI model after successful poisoning?

The interception phase is where the attack becomes active. Malicious inputs successfully introduced during the poisoning phase are absorbed by the AI model, intercepting its output to achieve the attacker's goals. Common interception schemes include:

· Attacker-controlled tool usage: Forcing a model to call specific tools with parameters defined by the attacker.

· Data extraction: encoding sensitive data from the model context into output data (e.g. URLs, CSS, file entries).

· Generating disinformation: creating responses that are knowingly false or misleading.

· Context-aware payload: Execute malicious behavior only in the target user context.

In agent-based workflows, interception becomes even more powerful. The increased autonomy provided by the model means that attackers can manipulate the model's goals, not just its output, forcing it to autonomously perform unauthorized actions.

Defense priorities to prevent capture:

· Separate trusted and untrusted data: Avoid processing adversary-controlled and sensitive data in the same model context.

· Improve model robustness: Use adversarial training, robust RAG, and instruction hierarchy techniques to train models to resist injection patterns.

· Validate tool calls with context: Ensure that each tool call matches the user's original request.

· Implement output-level safety barriers: Check the model's output for its intended purpose, safety, and impact before use.

Interception is the critical point at which an attacker gains functional control. Breaking the chain at this point protects downstream systems, even if the poisoning isn't completely prevented.

How do attackers maintain their influence across sessions and systems?

Persistence allows attackers to turn a single breach into persistent control. By injecting malicious data into persistent storage, attackers maintain their influence both within and between user sessions. Persistence methods depend on the application architecture:

· Persistence of session history: In many applications, embedded hints remain active throughout the session.

· Cross-session memory: In systems with user memory, attackers can inject payloads that persist across sessions.

· Shared resource poisoning: Attackers target shared databases (e.g. RAG sources, knowledge bases) to affect multiple users.

· Agent plan persistence: In autonomous agents, adversaries intercept the agent's goals, ensuring that the adversary's defined goals are continuously achieved.

Pinning allows attackers to repeatedly exploit hijacked states, increasing the likelihood of subsequent attacks. In agent-based systems, persistent payloads can be transformed into autonomous workflows controlled by attackers.

Protection priorities preventing consolidation:

· Clean before save: Apply protective barriers to all data before sending it to session history, memory, or shared resources.

· Allow user to view memory controls: Allow users to view, manage, and delete their saved memories.

· Contextual memory recall: Ensure that memories are retrieved only when they are relevant to the user's current request.

· Ensure data traceability and auditability: Track data throughout its lifecycle to ensure rapid correction.

· Write control: Require human approval or more stringent cleanup for any data writes that could impact the overall system state.

Persistence allows attackers to move from a single attack at a specific point in time to a persistent presence in an AI-powered application, potentially impacting multiple sessions.

How do attackers use iterations or maneuvers to expand their control over agent systems?

For simple applications, a single interception may be the end of the attack. But in agent-based systems, where AI models plan, make decisions, and act autonomously, attackers exploit a feedback loop of iteration and reorientation. By successfully intercepting a model's behavior, an attacker can:

· Return to step 2: poisoning additional data sources to impact other users or workflows, scaling resilience.

· Revise plans: In all-agent systems, attackers can rewrite the agent's goals, replacing them with goals defined by the attacker.

· Establish command and control (C 2): Inject data that instructs the agent to retrieve new attacker-controlled directives at each iteration.

This cycle transforms a single point of compromise into a systemic exploit. Each iteration strengthens the attacker's position and influence.

Defensive priorities for interrupting a change of direction:

· Restrict access to tools: Limit the set of tools, APIs, or data sources that the agent can interact with, especially in untrusted contexts.

· Continuously validate agent plans: Implement safeguards that ensure agent actions match the user's original intent.

· Continuously separate untrusted data: Do not allow untrusted input data to influence trusted contexts or actions, even between iterations.

· Monitor anomalous agent behavior: Detect agent deviations from expected workflows, privilege escalation, or access to unusual resources.

· Apply human intervention at key points: require manual review of actions that change an agent's scope or access to resources.

Target reversal is a method used by attackers to target components in agent-based systems. Breaking this cycle is crucial to preventing small breaches from escalating into large-scale attacks.

What kind of impact can attackers achieve with hacked AI systems?

Impact is the process by which an attacker's goals are materialized by forcing the output of a compromised model to trigger actions that affect systems, data, or users outside the model itself.

In AI-powered applications, impact occurs when output data is connected to tools, APIs, or workflows that perform actions in the real world:

· State change actions: changing files, databases, or system configurations.

· Financial transactions: Approving payments, initiating transfers, or changing financial records.

· Data theft: Encoding sensitive data into output that leaves the system (e.g. via URLs or API calls).

· External communications: sending emails, messages, or commands on behalf of trusted users.

The AI model itself often has no influence, but its results do. Security must extend beyond the model to control how the results are used in subsequent stages.

Defense priorities to repel an attack:

· Classify privacy-relevant actions, identify which tool calls, APIs, or actions may change external state or disclose data.

· Restrict privacy-sensitive actions, use human approvals or automated policy checks before execution.

· Design with the principle of least privilege, tools should have a narrow scope to minimize abuse, avoid feature-rich APIs that expand the attack surface.

· Use output sanitization to remove data that could cause unintended actions (e.g. scripts, file paths, untrusted URLs).

· Use content security policies to prevent front-end exfiltration techniques such as loading malicious URLs or inline CSS attacks.

Strong follow-up control over tool invocations and data flows can often deter attackers.

Integration of security tools with agents

The integration of SIEM, SOAR and agents creates an automated, predictive and adaptive security system that can be used at the following levels:

· Collection and correlation of events, including detection and rejection of false positives (e.g. Security Vision SIEM and agent).

· Analysis, forecast and recommendations (agents).

· Automatic and automated response (SOAR).

This approach ensures early detection, faster response and damage minimization.

Event collection and correlation

The SIEM performs its basic functions: correlation, normalization, and prioritization of events. The agent is initially used for training, which is a supervised false positive (FP) (i.e., a cybersecurity expert explicitly alerts the agent to FPs). Over time, the agent will recognize FPs and, accordingly, reduce the workload on experts (automatically closing explicit FPs). Events are then sent to agents for in-depth analysis.

Analysis, forecast and recommendations

Agents receive a stream of events from the SIEM and perform:

Anomalous behavior analysis (including obtaining information, such as Security Vision UEBA)

· Detection of unusual activity by users, devices, and processes.

· Forecasting potential threats based on historical data and attack patterns.

Correlation and data enrichment

· Consolidate events from different sources: network, applications, clouds, etc.

· Identifying Complex Attack Chains (Advanced Persistent Threat).

Risk assessment and prioritization

· Assigning risk levels to events and incidents.

· Determining which incidents require immediate intervention.

Formation of recommendations

· Suggestions for safe response (e.g. blocking users, isolating devices, checking suspicious IPs, etc.).

· Providing explainable solutions for auditing and reporting.

Automatic and automated response

SOAR accepts agent recommendations and implements pre-approved safe actions:

Automatic incident response:

· Isolation of compromised hosts.

· Temporary blocking of suspicious accounts.

· Forced password reset.

· Security Operations Center (SOC) notifications .

Automation of routine actions:

· Aggregation and classification of new events.

· Run checks and scans only in a secure way.

· Integration with external and internal threat databases (e.g. Security Vision TIP).

Ensuring consistency and control:

· Actions are strictly within the security policy.

· Each transaction is recorded for audit purposes.

For example, Security Vision SOAR using machine learning models provides:

· FP assessment – the model is trained on data from closed incidents; when an incident occurs, an assessment of the similarity with previously closed FPs is made as a percentage of compliance.

· Incident Severity Assessment – The model generates a severity assessment based on: the number of affected hosts, including their severity; the accounts used; and security bulletins, reducing response time from hours to minutes.

· Incident similarity assessment – the model analyzes the context of the incident, highlights similar previously processed incidents and the actions taken in the response process.

· Knowledge base recommendations – the model highlights the actions an information security specialist can take at specific response phases.

Feedback loop

1. The response results are recorded in the SIEM and transmitted to agents.

2. Agents update anomaly models, taking into account new attack patterns.

3. SOC receives updated recommendations and improved reports.

Effect: The system becomes adaptive, increasing detection accuracy and response efficiency with each incident.

And in conclusion

The number and intensity of attacks will only increase over time, including through the use of AI (AI-based tools and capabilities), and they will no longer be affected by time, holidays, or weekends. Therefore, it is necessary to build more comprehensive, multi-layered protection for systems and networks, at a minimum, proactively model information security threats (not just on paper), improve security, and, in the event of attacks or incidents, respond quickly and effectively. All this leads to the creation of autonomous security systems, but, of course, not without human intervention.