ChatGPT in IS - on the dark side and the light side

20.08.2023

| Слушать на Google Podcasts | Слушать на Mave | Слушать на Яндекс Музыке |

Security Vision

OpenAI was founded 7 years ago. At that point in time, there was already a realisation that something important was happening in the field of artificial intelligence, a qualitative breakthrough. And the creators of OpenAI (no more than a hundred people in the research lab) set themselves the goal of catalysing knowledge by using language model technology to create a positively controlled tool for mass application.

The progress in machine learning technology delights, worries and evokes these two feelings at the same time: the whole world is now defining the possibilities of a technology that will be one of the integral parts of our lives in the future. We would like to believe that we will be able to use artificial intelligence in the right way, but there are very different cases of its application on both the light and dark sides of IS. Let's take a look at what ChatGPT is today and the underlying principles behind the technology.

Figure 1: An example of an interaction with ChatGPT.

OpenAI is essentially a ‘chatbot on steroids’, that is, it is an artificial intelligence that generates personalised responses based on user input. In doing so, it is able to answer any question you ask it in a surprisingly natural way. The official page of OpenAI's creators says, ‘We have trained a model called ChatGPT that interacts in a conversational manner. The dialogue format allows ChatGPT to answer follow-up questions, admit its mistakes, challenge incorrect assumptions, and reject inappropriate requests.’

Model differences

OpenAI models available for interaction:

- GPT-3.5 and GPT-4 are the basic models we work with via Telegram or via API in basic settings. These are unmodified generic models that are optimised as much as possible for universal queries for the average user. They are trained on a limited amount of data, but they produce the fastest results of all models.

- Ada - the fastest of all models, capable of performing tasks such as text parsing, address correction and less complex classification tasks.

- Babage - best suited for simple classification tasks and performs SEO text analysis.

- Curie - suitable for classification tasks and sentiment analysis. The model also produces results for queries, answers questions and can be used as a general purpose chatbot. The comparison shows that it can perform many of Davinci's tasks, but at 10% of the cost.

- Davinchi is the largest OpenAI model (to understand: the model was trained using over 150 billion training blocks, whereas GPT-3.5 and GPT-4 were trained on only 6 billion blocks of data). The model successfully solves problems related to finding cause-and-effect relationships and generates better text when it comes to complex tasks. At the same time, Davinchi consumes more resources and time.

Interactive interaction with the model takes place at the expense of ‘Promts’, i.e. queries that we send to the neural network. Therefore, the efficiency of ChatGPT application directly depends on the way you construct a query. While studying the technology, we found out that the more detailed the question we sent (the more context of the situation we put in the question text), the more accurate and useful the answer we received in the first iteration.

What is the right way to ask a question?

There are a few basic rules when writing a question to OpenAI models:

1. It is necessary to describe, set the role of the questioner: I am an IS analyst, student or devsecops.

2. It is important to use a verb - the action you expect from the chat: describe, write, rewrite, translate.

3. Don't forget to use leading questions and instructions in describing the situation: Am I understanding correctly? Give me an example! etc.

4. It is better to write in English - the result will be faster and of higher quality (you can translate via deepl - also on neural networks).

Figure 2. ChatGPT's explanation of what Promt is.

Use all the features of ChatGPT

In order to get a faster and better answer, we have noted a few important points for ourselves:

1. It is useful to be careful about describing roles (not only your own, even those of several participants in the situation). In this way, you set the tone of the communication, namely the specification of the language of interaction with you.

Examples:

a. You can ask the model to explain in a way that the child can understand. For example, the author once used chat to explain a python script to her ten-year-old son, and the mission was accomplished successfully.

b. Or another situation: if you ask a chatbot to give an example of wi-fi attacks by a SOC analyst or by a regulator, you will get completely different results.

2. Use the model's ability to make decisions: ask for recommendations depending on the situation, and OpenAI will give you personalised advice.

The difference between chat and search engines is that ChatGPT can formulate search queries itself, navigate through them, analyse the content of web pages, and even transcribe video into text. And while the machine is doing all this, it's actually recording its train of thought to make the next search query even more accurate in the context of a given situation. You could do all this yourself, but searching for information is a tedious enough task, and people don't really like to do it. It is much more interesting to control, to take the position of a manager, where you can, if you want, triple check your work, correct a fragment of the model's reasoning chain and send the result for refinement.

Sometimes ChatGPT provides incorrect information, especially if the original data contains errors or inaccuracies. We have encountered it referencing made-up python libraries. Such collisions are primarily due to the fact that ChatGPT does not have real-time access to the Internet. The model was trained on data up to December 2021, and we should keep in mind that information from the last month will not be included in the result of its analysis.

To summarise: if you are thinking about choosing a profession of the future, it is already clear that a neural network architect or query operator will be one of the most in-demand jobs.

Limitations

Probably the most difficult limitation of the model is that ChatGPT has a limit on the number of tokens (words or characters) in a single request or response. The limit depends on the version used: 4,000 tokens for ChatGPT-3.5 and 8,000 tokens for ChatGPT-4. This creates inconveniences when decomposing a query and affects the result of the model - it may not be detailed enough.

At the same time, do not be mistaken about the public capabilities of OpenAI: in mass application it has a huge number of restrictions on the information it produces. The restrictions are imposed from the side of ethical moral norms, current tolerance agenda and, of course, safety of application of the issued data. That is, if we ask ChatGPT to write malicious code (the model is good at writing code, as it has been trained on gigabytes of code in gits), we will receive a polite ethical refusal. ChatGPT will not make predictions, give out offensive content, write scripts for hacking, ddos and so on.

But people wouldn't be human if they didn't find a way around the locks imposed on neuronics work. Restrictions are removed by ‘chat bumping’ or errors of logic in the questions asked (for example, when you ask questions from multiple roles in the context of one dialogue as in dissociative disorder). This technique is called Jailbreak. But it is quite tedious to realise logic errors - the process will take some time, so users came up with the idea to automate it, and for the sets of hacking promts on popular topics, they came up with extensions. In principle, using sets of preconfigured prompts is a very effective way to bypass not only limitations, but also to improve the quality of the information obtained and speed up the process of obtaining it, because they guide the system in the right direction.

Figure 3: An example of a browser extension with ready-made DAN (Do Anything Now) models.

However, despite all these advantages, the immense power of this tool can also be exploited by attackers.

Dark Side

For us, ChatGPT is a quick source of insights and endless analytics possibilities. Unfortunately, these capabilities are equally accessible to people with bad intentions. Of course, the creators have provided software plugs on the model to prevent it from writing malicious code or tailoring spearphishing to a particular company, but the restrictions can be bypassed (at least with Jailbreak). So there are many reviews of ChatGPT use cases available on the internet by direction:

- OSINT

- FISHING

- DDOS

- Bruteforce

- SQL Injections

- Scan for BufferOverflow Exploits

- RCE code.

You can find materials in the form of descriptions, explanations, git projects or even python programming code.

Figure 4: Example script for a DDOS attack in python.

Relatively recently you had to spend a long time reading forums, searching for materials and examples of use cases. Now good use cases and guides are within the range of one click and a couple of promt queries. That is, an attacker is separated from illegal actions (e.g. on osynth) by a distance of one or two questions to ChatGPT.

Figure 5. Script for Scan for BufferOverflow Exploits.

Previously, we have already seen an increase in fraud through various artificial intelligence techniques that represent a new form of compromising information (e.g., deepfake). The trend was so dangerous that it triggered a wave of research, reports and warnings from vendors and analytics agencies around the world. Now this type of cyber fraud is being augmented by the next generation of artificial intelligence in the form of ChatGPT - its use can exponentially increase fraudulent activity.

The days of looking for a misspelled URL or spelling mistake as an indicator of a phishing attack seem to be behind us. The tool creates such a convincing, coherent and grammatically correct text on any topic that even professional editors cannot identify its artificial origin. As a consequence, we see frequent cases of using the model to generate phishing emails.

And this is a very serious problem. An example of artificial intelligence creativity is an article that claimed that Ilon Musk died in March 2018 in a car crash. Someone did indeed die in a Tesla car crash and it was in the news, but the model fitted the facts, created fake articles and included links in the material as confirmation. The result is the birth of a fake sensation that is plausible but not true.

Many shades of grey

In terms of indirect rather than direct threats, when working with ChatGPT you always need to be mindful of the controls for passing sensitive data into the model in the context of an organisation or personal information. Everything that has ever been on the internet is already public. For competent interaction with the model on working issues it is worth to understand the methodology of building obfuscation or depersonalisation of data: do not ‘shine’ IP, NTLM hashes, do not send mimicatz output. As obfuscation of information you can add noise data to the query, which will not reduce the quality of the model, but at the same time will increase the entropy of the transmitted information. That is, make a query from real data and deception data.

Let's consider an example of using OpenAI to analyse (enrich) hashes seen in the context of an IS event or incident: when requesting additional information and maliciousness verdicts on suspicious files, we can dilute them with randomly generated hashes. Half-fake data provides the right level of obfuscation: the model will not understand which hashes were actually seen on the infrastructure and which were added artificially.

You can also use user name, host name masking rules for obfuscation; passwords should be salted, of course.

Just keep in mind that chat has memory.

On the bright side

We think that most of us haven't even realised all the possibilities that ChatGPT brings. You could ask a machine to make some graphs for research - that would be a super high-level instruction with a lot of intent behind it (when the user doesn't know what they want themselves). Artificial intelligence will have to surmise what the user might be interested in and produce a few ideas (bar charts, timeline, tag cloud). But, most interestingly, artificial intelligence can actually make them.

Thanks to ChatGPT, an IS analyst can quickly find out:

- How the mitre matrix is used

- What a 4688 event is

- Where to start investigating potentially infected macOS laptops

- The stages of an incident investigation

- How to build incident response on the first line of juice

- What are the basic methods of anchoring in a Windows system

- Basic methods of horizontal movement

- And much, much more.

Figure 6: An enquiry about how to respond to incidents on the first line of SOC.

To show use cases, we can imagine the following situation: an employee who yesterday was doing IT tasks (for example, setting up crypto pro keys) has been moved to the first line in the SOC. A junior specialist, in order to prove his/her worth and gain a foothold in a new promising field, needs to learn basic information rather quickly, immerse himself/herself in SANS and NIST frameworks.

So, ChatGPT can quickly walk a person through the process, through the steps of familiarising themselves with investigation and incident response with their background based on questions:

- What is data source in SIEM?

- What are response and enrichment playbooks?

- How is containment performed?

ChatGPT will really help.

Figure 7: A query that accelerates the collection of information about containment methods.

There are many use cases for ChatGPT in IS: artificial intelligence can improve the efficiency of first-line processes; improve the quality of analytics - investigate what was not investigated before (cover black spots); help to arrive at best practices. All can be done just by opening a browser with ChatGPT.

With great power comes great responsibility

There is a constant debate around OpenAI technology, because it has both obviously good and potentially bad uses (we're not talking about anti-utopian fears anymore). Is the tool not a pandora's box? What are the broader implications that await us once it is deeply integrated into everyday life?

We believe that technology should be approached soberly: keep the risks in mind, understand the limitations, but at the same time do not ignore the available source of endless information and analytics. Sooner or later, a symmetrical response to new threats will be developed. In particular, there are already a number of studies on creating language models capable of detecting content generated by tools such as ChatGPT, which will combat the increase in social engineering attacks. We live in an era of artificial intelligence tools, and these tools are likely to identify each other based on the unique ‘signature’ of a particular model (we remember that classification after training on generic datasets is one of the classic applications of machine learning). Other contextual metrics such as time and location can also be used to determine whether a message is genuine or not.

That said, those who would not use the accelerators of our thinking, the tools and benefits of today's ever-changing technology, should know that these technologies are already being used by the attacker, and at times like this, the attacker will be many steps ahead.