Ruslan Rakhmetov , Security Vision
When ensuring information security, the use of cryptographic protection methods is not always feasible for technical and organizational reasons, and the use of patent law for legal protection requires the publication of a description of the invention and technical documentation, which is not always desirable. Obfuscation techniques are used by defenders and attackers to conceal information and complicate the analysis (reverse engineering) of systems. The goal of obfuscation is not to guarantee protection of systems from unauthorized access and analysis, but to increase resource consumption so much that such analysis is economically unfeasible for a potential adversary. In the previous section of this article, we discussed the basics of obfuscation and some of its types, and in this publication, we will discuss the use of obfuscation in software and hardware.
1. Hardware obfuscation
When creating a complex product, private companies and governments invest significant budgets and the efforts of designers, engineers, and architects, especially when developing cutting-edge models. Such products attract the interest of competitors and foreign intelligence agencies, who seek to obtain innovative products and then conduct the process of reverse engineering (hardware reverse engineering) . reverse Reverse engineering (in Russian): to understand the structure of a product and its operating principle, create a 3D model of the device, recreate the design documentation, and ultimately ensure the process of independently creating a similar or improved product, its components, and spare parts, sometimes without the knowledge of the copyright holder. A similar approach is used in industry (metallurgy , automotive , aircraft manufacturing ), medicine , and microelectronics . Reverse engineering of microcircuits is complicated by the relatively small size of the devices and their multi-layered nature, so methods of decapsulation of microcircuits (opening the case and gaining access to the chip) and etching the chip layers with acids are used, and scanning electron microscopes are used (SEM) and focused ion beam microscopes (FIB), the price of which can reach hundreds of thousands and even millions of dollars, as well as the latest methods ptychographic X-ray laminography (PyXL). Naturally, the creators of the original (source) products and copyright holders do their utmost to prevent such unauthorized copying of devices and industrial espionage – for example, in microelectronics, methods of protection against removal of the housing and layer-by-layer restoration of the chip topology are used , methods of shielding the elements of the chip (in particular, EEPROM modules) and methods of destruction of the chip components under certain external physical impacts are used, protected types of memory (MRAM, Antifuse, ROM). Developers can apply invisible, unique, traceable, and tamper-resistant watermarks to products in digital (developer data is written into the ROM of components), analog (invisible voltage/current changes), and structural ( component arrangement within the device) forms. In addition to the theft of know-how and subsequent counterfeit production, reverse engineering can be used to detect embedded bugs in microcircuit components – however, at the microprocessor level, such attacks have so far only been studied in laboratory conditions. However, despite the lack of widely known cases of implants being introduced directly into microprocessors at the hardware level, "in the wild", backdoors are still embedded either in BIOS / UEFI and device firmware , or operate under the guise of vulnerabilities and undeclared capabilities in various technologies and microcircuit components .
Hardware backdoors are similar in functionality to hardware Trojan viruses and "kill switches". Backdoors are intentionally embedded by the chip developer (on their own initiative or at the direction of a government agency), while hardware Trojan viruses are introduced into chips by third-party attackers – this could be a CAD company developing the chip design, a supplier of IP blocks (intellectual property A hardware Trojan (a core, complex functional blocks for designing integrated circuits) or a factory where the chips are actually manufactured. According to the accepted taxonomy , hardware Trojans can be embedded at various stages of a chip's creation (specification definition, architecture development, production, testing, assembly/packaging/installation) and operate at various abstraction levels. They can be activated by an external/internal trigger, be embedded in various components of the chip (processor, memory, internal peripherals, power supply, clock generator), and perform various unauthorized actions – modifying the chip's functionality, reducing its performance, causing denial of service, or transmitting information or encryption keys externally through third-party or covert channels. Kill switches, in turn, are designed to disable the operation of a microelectronic device upon an external command or when certain conditions are met. Similar to the automatic blocking of a smartphone during a theft attempt, developers or intelligence agencies can embed similar functionality into various products to enable their remote shutdown. In general, the introduction of hardware Trojans and kill switches, i.e., attacks on the supply chain of microelectronic devices and electronic components, is extremely resource-intensive and accessible only to highly sophisticated attackers – intelligence agencies, scientific institutions, and government agencies.
2. Software obfuscation
Protecting programs from analysis can be used by legitimate developers for a number of purposes: for example, to protect intellectual property and copyrights; to prevent analysis, cloning, hacking, and illegal/unlicensed use of the program; and to prevent interference with the program's operation, including by malware. Attackers also seek to protect their malware samples from analysis by virus analysts, automatic analysis by antivirus programs, and sandboxes. In both cases, protection can be achieved through obfuscation : the program will continue to operate (perhaps at a reduced speed), but its analysis and research will be complicated. Source code, bytecode, and compiled binaries can be obfuscated . To understand the obfuscation mechanism, it is important to consider that programming languages are divided into compiled, interpreted, and hybrid:
1) Compiled: the source code (text) of the program is passed to a compiler program, which creates a binary executable file consisting of headers, machine code (instructions), and data. Such a file can only be run on a specific OS and processor architecture, depending on the compiler used (for example, an .exe file for running in a Windows environment on a PC with an x86-64 architecture ) . This type of compilation is also called AOT (Ahead-of-Time) compilation.
2) Interpreted: the program's source code is passed to an interpreter program for execution (for example, commands in the Linux terminal are executed by the Bash interpreter). In some interpreted languages, the program's source code is first compiled into bytecode (intermediate/portable architecture-independent code) and then this bytecode is executed by the interpreter – this approach ensures cross-platform compatibility, in which the same program source code can be run on different operating systems and processor architectures. For example, the text of a Python program, saved in the .py format, is compiled into bytecode in the .pyc format and then executed by the CPython interpreter, which runs on all popular types of OS. Another example: the source code of Android programs is written in Java or Kotlin, then compiled into bytecode in the DEX (Dalvik Executable format), and then the .dex files are placed into executable APK files – on Android devices, they run in the Android runtime environment Runtime (ART), which translates the resulting bytecode into instructions for the processor of a specific device. Programs in interpreted languages are easier to test and debug because the interpreter executes the program instructions sequentially and immediately reports errors, and the compilation step is either omitted or occurs unnoticed. However, programs in interpreted languages run slower due to the presence of an intermediate layer (the interpreter), while compiled programs are executed directly by the processor.
3) Hybrid (conditionally compiled): To make a program more universal and cross-platform (the ability to run the program on different operating systems and processor architectures), a hybrid approach is used: the source code is converted into an intermediate language, which is then translated by the runtime environment into machine code for a specific platform. For example, the source code of a program in C # or Visual Studio Basic .NET is translated into the intermediate language CIL ( Common Intermediate Language ) and then compiled into machine code by means of dynamic compilation JIT (Just-In-Time in the .NET runtime (CLR – Common Language Runtime ) is compiled directly at the time of execution on a specific OS and platform. Similarly, JIT compilation is used to ensure platform independence and acceleration, for example, within the JVM (Java Virtual Machine) when running Java programs, as well as when running modern JavaScript engines (V8 from Google, SpiderMonkey from Mozilla), which translate resource-intensive sections of JavaScript code into optimized machine code stored in memory and executed directly by the processor. It should also be noted that the complexity of the optimizers in JIT engines and the specifics of their work with memory (writing compiled code to the memory area and then executing it) lead to dangerous vulnerabilities (according to the calculations of the Microsoft team Edge, more than 45% of all vulnerabilities associated with the V8 optimizer were due to the JIT compiler) and incompatibility with the OS memory protection functionality (Arbitrary Code Guard, Control Flow Guard , Hardware – enforced Stack Protection, Control – Flow Enforcement Technology). This is why some browser vendors recommend disabling JIT optimization to reduce the attack surface. In Android, increased security is achieved by disabling JavaScript engine optimizers in V8's Advanced mode. Protection, in the MS browser Edge can disable JIT compilation via the mode Enhanced Security, on Google You can disable the JavaScript optimizer in Chrome settings , in Mozilla Firefox JIT optimization can be disabled through the about: config service menu. Set the following parameters to "false": javascript.options.baselinejit , javascript.options.ion , javascript.options.asmjs , javascript.options.wasm_baselinejit , javascript.options.wasm_optimizingjit.
Obfuscation is used to complicate program reverse engineering, which can be accomplished using debuggers (which allow program analysis while it's running), disassemblers (which translate the binary file's machine code into low-level assembly code), and decompilers (which translate the binary file's machine code into human-readable code in a high-level language). We'll discuss these tools in detail in the next article, but for now, it's important to understand that a decompiler will only attempt to "guess" the program's source code but won't be able to restore it exactly. It's likely that original comments, function and variable names will be lost, and the code structure will be corrupted. It's important to note that recovering source code from binary files is more difficult than from intermediate and bytecode (e.g., from Java or C # programs, or from DEX files). For example, attackers creating malware for Android often compile their code (i.e., create an analogue of "native libraries"), since reverse engineering compiled binaries is more difficult than reverse engineering DEX bytecode. The goal of obfuscation is to complicate such decompilation, using automated tools called obfuscators, which are used by both attackers and legitimate software developers. However, obfuscation does not guarantee the failure of reverse engineering and the impossibility of restoring the source code; rather, obfuscation aims to complicate reverse engineering to the point where it becomes impractical and ineffective from the adversary's perspective.
In addition to obfuscation, server-side code execution methods (in web applications) and cryptographic methods (encryption, digital signatures) can be used to protect code. Compared to encryption, obfuscation is not regulated (there are no export or legislative restrictions), there is no need to use or securely store encryption keys, and no specialized equipment is required to perform cryptographic transformations – which is why obfuscation can be a preferable option in some cases. However, obfuscation complicates the process of creating and debugging programs for developers, increases the size of the final executable file, and can significantly slow down the program's performance. However, some obfuscators can even optimize code.
According to the accepted taxonomy , obfuscation methods (code transformation/conversion) are divided into the following types:
1. Obfuscation of the presentation (Layout obfuscation):
· Mixing (changing) names of variables and functions;
· Change formatting;
· Deleting comments;
· Removing debug information (considered one of the simplest and most effective ways to make reverse engineering more difficult).
2. Data obfuscation (Data obfuscation) and methods of so-called "false refactoring":
· Storage and encoding: separation of variables, conversion of static and procedural data, transformation (packaging) of scalar variables into objects, changing encoding;
· Aggregation: combining scalar variables, changing inheritance relationships, restructuring arrays;
· Ordering: reordering variables, methods, arrays.
3. Control flow obfuscation (Control/code flow obfuscation):
· Aggregation: inserting functions, separating operators into a separate function, combining and cloning functions, unrolling (unwinding) loops;
· Ordering: reordering expressions, operators, loops;
· Calculations: tabular interpretation, transformation of a reducible control graph into an irreducible one, avoidance of library calls, addition of unreachable (never executed) and dead (not affecting the result of the work) code, complicating the conditions of loop execution.
4. Preventive transformations (Preventive transformations):
· Using weaknesses and features of tools (decompilers, deobfuscators) to complicate reverse engineering;
· Exploiting the weaknesses and peculiarities of deobfuscation techniques .
For more information on obfuscation techniques and examples, please follow the links:
· https :// citforum . ru / security / articles / obfus /
· https :// citforum . ru / security / articles / analysis /
· https :// sharcus . blogspot . com /2011/06/ blog - post . html
· https :// www . sciencedirect . com / science / article / pii / S1877050915032780
· https :// www . researchgate . net / publication /235611093_ Techniques _ of _ Program _ Code _ Obfuscation _ for _ Secure _ Software
· C Code Obfuscation Competition page (The International Obfuscated C Code Contest)
In general, binary (machine code) obfuscation is achieved by adding redundant branches, loops, and functions. This occurs at compile time and leads to intentional complication of program logic. Source code obfuscation is achieved through deliberately obfuscating the code, deliberately writing so-called "spaghetti code" (poorly structured, difficult-to-understand code), adding redundant constructs, and removing comments. As an example of source code obfuscation techniques, consider simple Python code for adding two user-entered numbers:
def sum( a, b):
return (a + b)
# Get two numbers from the user
a = int( input('Enter 1st number: '))
b = int( input('Enter 2nd number: '))
# Withdraw on screen amount numbers
print( f'Sum of {a} and {b} is {sum(a, b)}')
And here is what the same program would look like, but obfuscated using dynamic import of the module via the __import __ ( ) function and dynamic execution of code via the exec () function, with letters replaced with ASCII codes, with the addition a+b replaced with the operation a-(-b), with comments removed:
_m = __import__('builtins')
_fn_name = ''.join([chr(115), chr(117), chr(109)])
exec(f"def {_fn_name}(*_):"
f" return (_[0] - (-_[1]))")
_p1 = ''.join(map(chr, [69,110,116,101,114,32,49,115,116,32,110,117,109,98,101,114,58,32]))
_p2 = ''.join(map(chr, [69,110,116,101,114,32,50,110,100,32,110,117,109,98,101,114,58,32]))
_a = getattr(_m, 'int')(getattr(_m, 'input')(_p1))
_b = getattr(_m, 'int')(getattr(_m, 'input')(_p2))
_result = globals()[_fn_name](_a, _b)
getattr(_m, 'print')(f"{''.join(map(chr,[83,117,109]))} of {_a} and {_b} is {_result}")
Similar obfuscation is being carried out often with the help of obfuscator programs such as How StarForce C++ Obfuscator , Themida , Digital.ai Application Security , O-MVLL , Obfusk8 . Also common .NET obfuscators , tools For obfuscation scripted languages ( Invoke-Stealth for PowerShell, Blind Bash for Linux Bash, BatchObfuscator for Windows Batch, Pyarmor for Python) and others Tools . Obfuscation can be achieved by encrypting strings, API keys, URLs , and domains using a logical XOR operation with a secret key, as well as by encrypting a block of data in a binary file. In the case of malware, decryption occurs while the dropper or loader is running , directly in RAM, which helps evade detection by network and file protection tools. Furthermore, obfuscation can be a side effect of minification methods in programming, encoding, and data compression, as well as the use of optimizing compilers and packers such as UPX, MPRESS, PECompact, and Alternate .EXE Packer (which is actively used by attackers, leading to false positives from security tools).
In conclusion, we note that in the MITRE matrix ATT & CK is a tactical goal of attackers to avoid detection (tactics "Defense Evasion") can be achieved through the malicious technique "Obfuscated files or information", which includes a number of sub-techniques:
· Binary Padding : Attackers can add junk data to a malicious file to increase its size to a level that is unsupported by security tools and sandboxes;
· Software Packing : Attackers can use packers and program protection using virtual machines that launch when the code is executed;
· Steganography : a technique for hiding data in various digital objects (images, audio and video recordings);
· Compile After Delivery : Attackers can upload malicious code to the target system as text (possibly encrypted or obfuscated ), which will then be compiled into an executable file on the target device itself;
· Indicator Removal from Tools :Attackers can change the typical indicators of compromise (for example, change the set of domains or IP addresses that the malware accesses) or rebuild the malware to change the hash of the file;
· HTML Smuggling: As we have written before , when using the technique "HTML Smuggling" JavaScript code placed on a web page triggers the download of a binary object (JS blob), the contents of which are decoded by the browser on the local PC and then saved on the disk, and the user is encouraged to launch it under various pretexts;
· Dynamic API Resolution : the API functions called by the operating system can provide information security analysts with information about the operation of the malware, so attackers use methods to hide the API functions used (for example, they use hashes or identifiers to indirectly call API functions);
· Stripped Payloads (Payload Removal): Attackers remove human-readable information (strings, symbols) from the source code and at the compilation stage to make reverse engineering of the malware more difficult;
· Embedded Payloads : Attackers hide malicious payloads in other files (e.g. DLL , LNK , PNG);
· Command Obfuscation : Attackers change the appearance of commands to be unreadable, for example, by using base 64 or URL encoding. encoding, by separating commands (for example, "ShellEx"+"ecute"), using different formatting;
· Fileless Storage : attackers save malware not as files, but in RAM, the registry, and the WMI repository on the attacked device;
· LNK Icon Smuggling : Attackers exploit metadata fields in LNK files to hide malicious content (similarly, malicious content can be placed in the metadata of other file types, such as office documents);
· Encrypted / Encoded File : Attackers use encryption and coding of malicious content to prevent it from being detected by security tools;
· Polymorphic Code : Polymorphic (mutating) code changes during execution with each new run, which helps to avoid detection;
· Compression : Attackers can use tools to compress and archive files, encrypt and set a password to open the archive (which helps bypass security measures), and create self-extracting archives;
· Junk Code Insertion (Insertion of garbage code): attackers add additional garbage constructs to the original malicious code to complicate code analysis;
· SVG Smuggling: Attackers exploit features of SVG graphic files that can display HTML and execute JavaScript when the image is loaded, subsequently generating a malicious object locally.