Ekaterina Gainullina, Security Vision
Databases turn into an "open book" when confidential information from them becomes available to intruders or the general public due to leaks. Unfortunately, the years 2024-2025 have brought many such leaks in a wide variety of industries. According to Roskomnadzor, in Russia alone, 135 database leaks were recorded in 2024, affecting more than 710 million records about Russians. The trade sector and government organizations have become leaders in the number of leaks. The trend is similar in the world: globally, the number of leaks and compromised recordings is breaking records. This article will analyze recent high-profile leak cases by sector (energy, public sector, e-commerce, etc.), analyze the technical reasons: from open NoSQL ports and merged backups to vulnerable CI/CD pipelines, as well as practical recommendations on how to prevent your database from becoming a public library. At Security Vision, we also consider these tasks to be key in developing next-generation solutions, including in the field of database security automation and secure development.
Leaks 2024-2025: industry panorama
There is almost no industry that has avoided data leaks in the last two years. Below we will look at some illustrative cases from different sectors, their causes and consequences.
Energy sector (Fuel and energy complex)
Companies in the fuel and energy sector have also experienced many leaks and cyber incidents. The industry is considered a critical infrastructure, and attacks on it can have far-reaching consequences. According to a Cybernews report, more than 50% of the world's largest oil and gas companies experienced at least one data leak over a 30-day period (as of May 2025). This is a staggering statistic reflecting the wide gaps in cybersecurity of energy companies: 69% of them received the lowest grades (D or F) in terms of protection. Commonplace things are often to blame: unapplied patches, vulnerable web services, leaked employee credentials. The same report noted that credentials (logins/passwords) were stolen from 80% of companies in the industry, and compromised administrator passwords often lead to direct access to critical systems, including SCADA and databases.
One of the high-profile episodes was unauthorized access to Halliburton, a major supplier of services for oil production. In August 2024, Halliburton discovered unauthorized access to its systems, after which it disconnected some of the networks and informed regulators about the hacking. Despite the paucity of official information about the nature of the attack, it is known that the company suffered losses of about $35 million due to downtime and reaction to the incident (according to media estimates). It is likely that the attackers penetrated through a vulnerability in the internal network or through compromised access in the energy sector, such targeted attacks are increasingly being recorded. For example, a year earlier, in March 2023, Hitachi Energy confirmed a data leak as a result of a Clop ransomware attack that exploited 0-day in the GoAnywhere file sharing software. In other words, unauthorized access to energy companies' data is often achieved through vulnerable third-party services: file storage, VPN hubs, or mail gateways, after which attackers already copy databases with confidential information.
Leaks related to operational technologies are especially dangerous for the fuel and energy sector. Let's imagine that a database containing hardware operation parameters, network schematics, or logs of SCADA systems becomes publicly available. This is literally an instruction for intruders on how to disable critical facilities. Fortunately, there have been no major catastrophic leaks in the energy sector so far, but the trend is alarming: data extortion attacks on the industry are coming in Leaks related to operational technologies are especially dangerous for the fuel and energy sector. Let's imagine that a database containing hardware operation parameters, network schematics, or logs of SCADA systems becomes publicly available. This is literally an instruction for intruders on how to disable critical facilities. Fortunately, there have been no major catastrophic leaks in the energy sector so far, but the trend is alarming: data extortion attselves.
Government data and public sector contractors
Government organizations operate with huge amounts of citizens' data, which is why their leaks are the most resonant. In 2024-2025, the sad series of compromising public sector data around the world continued. IT contractors and service providers for the state are often targeted, through which attackers gain access to departmental databases.
So, in the middle of 2023, there was a large-scale hacking of the MOVEit file transfer service, which was used by hundreds of organizations, including government agencies. For example, the US Department of Energy and contractors of the Ministry of Health were hit. One of the victims was the American company Maximus, which processes health insurance data for the government. The Clop hack group, exploiting the MOVEit vulnerability, stole the personal information of 8-11 million people from Maximus (US citizens whose data was processed as part of social programs). The leak included full names, contact details, social security numbers, and health information – in fact, an entire registry of public service clients ended up in the hands of extortionists. This case highlights the problem of the supply chain: even if the government databases themselves are protected, a weak link in the form of third-party software or a contractor leads to data compromise.
Another example is the leak of 237,000 records of US federal government employees from the Department of Transportation (DOT) systems. This was announced in July 2023: it turned out that the data of employees participating in the travel compensation program (transit benefits) became available to intruders. In this case, the internal DOT application related to the administration of benefits was probably hacked and registers with names, positions and partially financial information of officials were extracted from the database. Such a leak can be used for targeted phishing attacks on government employees, blackmail, etc.
E-commerce and online services
E-commerce and online services have huge databases of user data – and they often become the "open book" after leaks. The attackers are hunting for accounts, order history, payment information of customers of online stores, delivery services, reservations, etc.
In May 2024, the data of Ticketmaster, the global ticket operator (Live Nation), was leaked. Hackers penetrated Ticketmaster's systems and extracted a staggering amount of data – 560 million order records and personal customer data. There are offers for the sale of this database on the darknet. The leaked information included names, addresses, email, order history, and partial payment details of Ticketmaster users. The technical details of the hack were not disclosed, but it is known that the attackers had long-term access to the internal network. They probably exploited a vulnerability in the web application or API, or compromised the administrator's credentials. This incident is one of the largest in terms of the number of people affected in 2024, and it demonstrates that e-commerce companies must protect themselves not only from DDoS and carders, but also from targeted attacks on their bases. The consequences for Ticketmaster included regulatory checks, lawsuits, and the need to notify millions of users worldwide.
E-commerce platforms also often fall victim to trivial configuration errors. For example, in 2024, researchers discovered many open databases used by web and mobile applications (including trading platforms). Over 900 sites have incorrectly configured their cloud databases, resulting in the publicly available data of 125 million users, including names, phone numbers, email addresses, passwords in clear text, correspondence, and even bank details. Among the affected are online services from various fields (training platforms, CRM systems, restaurants with online orders, etc.). This is not the result of a hacker attack per se – it's just that the databases were opened due to the lack of access rules. Virtually anyone could have read (and in some cases changed) it this data is simply sent by sending a request to Firebase. It is noteworthy that some companies did not even respond to the researchers' notifications about the vulnerability. Such a large-scale incident shows how the human factor and configuration errors can nullify all security efforts.
How data becomes an "open book": ways of leaks
As can be seen from the cases described above, it is not always the case that an attacker directly "hacks" your DBMS by choosing a password for PostgreSQL or exploiting a MongoDB vulnerability. More often than not, a leak is preceded by a chain of events that turns private data into public data. Let's look at the main vectors through which confidential data from databases "leaks" out, often in unexpected ways.
Open ports, default passwords, and vulnerable settings
Direct exposure of the database to the Internet without proper protection is still one of the most frequent factors of compromise. This includes the trivial: "The DB server is accessible via a public IP and uses a weak password or does not require authentication at all." There are still thousands of such cases in 2024:
● Shodan's search regularly reveals MongoDB and Elasticsearch open without a password. Automated bots instantly find such nodes, copy the data, and often leave a ransom note. For example, there is a well-known ransomware campaign against MongoDB administrators, during which attackers erased the contents of ~23,000 MongoDB databases and instead saved a ransom note (~0.015 BTC). They threatened to publish the stolen data and report the company to regulators (for violating GDPR) if they were not paid. This type of attack was first observed back in 2017, but spikes occur regularly – all because administrators continue to deploy NoSQL on a "done and running" basis, forgetting to enable authentication and close the port.
● Redis is another favorite target. This memory-oriented database is often used inside local area networks, but sometimes it is mistakenly opened outside. Without a password, Redis allows you to execute a number of commands, and techniques are known for writing an SSH key to the ~/.ssh/authorized_keys file through an open Redis, thus gaining access to the server. There have been incidents when backup.sql files with a ransom demand appeared on the victims' servers – attackers used Redis to download and launch malware, and exfiltrated other data.
● Microsoft SQL Server (MSSQL) is also under massive attacks. Research has shown that botnets are actively brute-force attacking MSSQL in order to gain access to servers with trivial passwords. In 2023-2024, the FreeWorld/Mimic ransomware campaign appeared, in which a sequence of actions was discovered: The bot selects the administrator password on an open MSSQL, then uses the access as a springboard – it starts downloading malicious DLL files through xp_cmdshell, deploys Cobalt Strike Beacon, steals other passwords, and finally encrypts files on the server. This attack is called DB#JAMMER. It is noteworthy that the main condition for its success is an open MSSQL port and weak credentials. Securonix analysts note that the compromised servers almost always had default or very simple passwords ("12345", "password", etc.). In addition, the services were accessible from the outside, while recommendation No. 1 was not to publish the DBMS directly to the Internet. In general, if your SQL database is on public display and even under sa/sa123, expect trouble.
Conclusion: Do not expose databases unnecessarily. If you need external access, use firewalls, lists of allowed IP addresses, VPNs, connection encryption, and strong passwords. Disable or restrict dangerous functions (for example, xp_cmdshell in MSSQL). And constantly monitor whether your ports 3306/5432/27017/6379, etc. are "glowing" on the Internet.
CI/CD and DevOps as a new perimeter: leaks through pipeline
Modern development involves the use of continuous integration and delivery (CI/CD), version control systems, containerization – everything that makes a developer's life convenient. But a mistake in these tools can open access to data even more widely than the vulnerability of the database itself.
Let's recall the Byte Federal case: A vulnerability in GitLab led to a compromise of the server with user data. There are many similar cases:
● In January 2023, the popular CI – CircleCI service was hacked. The attackers gained access to the environment variables of many projects (including access keys to databases and cloud storage). The companies were forced to urgently rotate all the secrets. This case is a warning: it is dangerous to store secrets (passwords, API keys) directly in pipeline variables or in the repository. It is better to use Secrets Management services (Vault, AWS Secrets Manager) or, at least, encrypt variables.
● The Uber 2022 incident (slightly earlier than our period, but significant): a hacker gained administrative access to Uber's internal resources by finding a PowerShell script with a password from the privilege manager in the public repository. Next, it's a matter of technique: as a domain administrator, he uploaded reports from Uber's internal databases. That is, one forgotten account in a script on GitHub turned into a leak of sensitive data and a complete compromise of the infrastructure.
● Leaked secrets on GitHub is a huge problem. According to GitHub, in 2024, their automated scanner detected more than 39 million secrets accidentally uploaded to public repositories (including API keys, passwords, and other credentials). Each such leaked secret is a potential key to a database or storage. Github notes that developers often sacrifice security for convenience: they commit configs with passwords because "it's faster this way", or they don't have time to remove the secret from the commit history. Solution: strict pre-commit hooks and the use of fake credentials in the code (and substitute the real ones from the environment variables on the server). It is also worth enabling the Push Protection function, which blocks push commits if it finds a sample key there.
Compromising CI/CD can lead to other consequences. For example, if malicious code gets into the pipeline, it can perform a pg_dump of your database and send the result to the attackers. Or embed yourself in container images.
Example of a vulnerable scenario: Imagine a Dockerfile that copies the entire contents of a project inside a container:
FROM python:3.11-slim
WORKDIR /app
COPY . . # copies all project files to an image
RUN pip install -r requirements.txt
...
If there is a file in the project folder .env with DATABASE_URL, passwords, and so on, then such a Dockerfile will include these secrets in the final image. If this image is published to a shared registry (for example, Docker Hub), attackers can download it and extract confidential information. It would be more correct to add .env in .dockerignore and do not copy it. But the human factor rules: according to statistics, hundreds of open repositories on GitHub contain Compose files with passwords to the database or keys to Firebase. These "stupid" leaks are among the most common.
Thus, DevOps equipment and processes should be considered part of the attacked surface. Regularly audit CI/CD configurations:
● Make sure that the build logs do not contain sensitive data (otherwise anyone with access to CI can see them).
● Use separate accounts/roles with limited rights to deploy applications to the database (so that even if the pipeline is compromised, the attacker does not get full DBA access).
● Isolate environments: staging and test should not have access to production data. Conversely, the production database should not "know" about the existence of CI.
● Use IaC (Infrastructure as Code) scanners. They automatically find potential misconfigurations. Example: using the Open Policy Agent or specialized products, you can analyze your Terraform/CloudFormation manifests for rules like "S3 bucket should not be public", "DBInstance type resource should not have a public IP", etc.
Backups, backups, and staging
Another huge leak channel is database backups. Admins and developers often make dumps for backup or for transfer to a test server. However, if the process is not organized safely, such a dump can "float away".
Typical problems:
● The backup file is left on an open file server or in a cloud bucket without access restrictions. For example, there is a case where applicants' resumes and employee data from a large corporation leaked because the HR department leaked a dump from Oracle and put it on a shared FTP server "for convenience." Someone indexed this FTP, and as a result, the files appeared in the search results. Now, instead of FTP, AWS S3 appears more often: there are a lot of incidents when sensitive documents or SQL dumps were in an S3 bucket with a public-read policy. So, the researchers found an open bucket Breastcancer.org (a charitable foundation), where there were thousands of cancer patient records – all because of an error in the access policy.
● Copying a combat base to a test stand (staging) without depersonalization. The developers take a dump of the production database to test on up-to-date data, and deploy it on a test server. But often the staging environment is poorly protected: it can use a simple password, or even stand open for the "client to see." As a result, the leak occurs precisely from the test stand. Example: in 2023, one of the food delivery services in the Russian Federation was hacked through an unsecured test cabinet, from there the attacker gained access to the order database (relevant because it was just copied once a week from production). The solution is to never use real personal data on the test, or at least anonymize them well. If absolutely necessary, protect staging in the same way as the combat environment (VPN, access monitoring, individual accounts).
● Storing backups "behind the stove". If the company does not have a centralized backup system, developers can store copies locally: on laptops, external disks. Lost laptop or USB flash drive = compromise of the database. Alternatively, the dumps are stored on a remote server without proper protection. There are cases when a company applied for the services of an outsourcer, handed them an SQL dump for analysis, and that outsourcer stored all client files in an open folder on his website, from where they could be downloaded by search bots. Backup management should be an integral part of the security policy: encryption, accounting of storage locations, access control.
A simple example of an incorrect S3 bucket configuration for backups:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::my-app-backups/*"
}]
}
This policy means that anyone on the Internet can download objects from the my-app-backups bucket. If prod-db-dump.sql is in there, it's only a matter of time before it leaks. The correct approach is to limit the Principal to specific service accounts/roles, and disable public access altogether.
Metadata and ancillary leaks
Sometimes data becomes an open book not directly, but through leaks of proprietary information. What could it be:
● Infrastructure configurations. For example, the terraform.tfstate file often contains parameters in clear text, including database addresses, names, and even passwords, if they are not marked as sensitive. If someone posts (or steals) Your tfstate can learn a lot about your environment, from server names to administrator password hashes. Similarly, Kubernetes-manifests, docker-compose.yml and others . they can contain DB connection strings. Therefore, never hardcode passwords into configs; use external secret providers or at least templates with placeholders.
● Cloud metadata. In the famous hacking of Capital One (2019), the attacker gained access to the AWS Metadata Instance via SSRF, pulled out the temporary keys of the IAM role, and downloaded confidential data from S3 through them. This is a lesson: the metadata of cloud instances should be isolated, and unnecessary rights for roles in the cloud should be minimized. In the context of the database: if, say, the EC2 instance on which the database is running has a role with access to backups, the attacker can download the database backup directly from AWS without even breaking the database itself.
● Unobvious channels. This includes cases where pieces of confidential data have leaked through seemingly harmless things: screenshots of the interface posted to the public (part of the database or API key may be visible in the screenshot); swap files or memory dumps accidentally left on public resources; old equipment. A curious example: in 2025, an old server was being sold on Avito, and the new owner discovered that the unencrypted database files of the previous owner, a small SaaS service provider, remained on it. No one bothered to clean the disk! Therefore, a leak can occur "offline" if you do not follow the media disposal procedure.
DevSecOps Best Practices for Leak Prevention
Finally, let's move on to the most important thing: how to prevent your database from becoming publicly available. We need to build multi–level protection at all stages of the data lifecycle, from development to operation:
At the development stage
● Force the use of Git hooks: set up pre-commit and pre-push hooks that scan the code for secrets, keywords (password, PRIVATE KEY, etc.) and large binary files. There are ready-made solutions, such as git-secrets or Gitleaks. The goal is to block a commit if the developer accidentally added a DB password or dump to the code.
● Threat Modeling: When designing new features, consider which data is affected and how it may leak. If the feature involves creating a temporary data file, plan to encrypt and delete it immediately.
● Minimize data in tests: Developers often need data for testing. Implement the practice of using generated or anonymized data. There are tools that generate fake names, addresses, and card numbers according to the database scheme – this is enough for functional tests. Real data should not be walking on developers' laptops.
● Secure coding practices: Train teams to recognize dangerous constructs (like the SQL injection above). Conduct training sessions on OWASP Top 10. The code review should include the following questions: "What if they put a quotation mark here?", "Is it possible to read this repository from the outside?", etc.
In CI/CD and testing environments
● Secrets Management: Use centralized secret repositories. Integrate them with CI/CD so that the pipeline gets access to passwords and keys only for the duration of the build/deployment and does not store them in pure variables. For example, HashiCorp Vault or the built-in GitHub Actions (encrypted secrets) tools help with this.
● Separation of rights: service accounts that deploy must have minimal access. CI should not connect to the combat database under superuser. Make a separate user in the database with limited rights, who can only perform the necessary schema migrations, for example, but not read all the tables.
● Log obfuscation: configure the CI system to mask secrets in logs (most support this: you can specify templates that need to be hidden). Then even if someone looks at the build log, they won't see the contents of the environment variables with passwords.
● Post-test cleaning: Automate the deletion of test data and backups that are created during the test run. Nothing should "hang" on the build agent or in the cloud storage after the pipeline is completed.
● Staging pentests and scans: Regularly scan test benches, as well as combat ones, for open ports, weak passwords, and known vulnerabilities. Try to conduct penetration testing at least once a year, including attempts to get data through CI/CD.
When deploying and configuring infrastructure
● The Principle of Minimum Privileges (PoLP): The database account that uses the application must have exactly the necessary rights. Do not run the web application under db_owner or root in the database. If the application only reads reference books and writes a couple of tables, give rights only to them. This will limit the scale of the leak if the application is compromised.
● Network segmentation: databases should be located in closed segments (VPC), without direct access to the Internet. They can only be accessed from app servers or via VPN by administrators. Firewalls at the cloud (Security Groups) or container level should allow connections only from certain hosts. Even if for some reason you have to make the database public (for example, for a mobile application), use an allowlist of specific IP addresses or at least a non-standard port + CDN targeting.
● Secure configs: make sure that dangerous functions are disabled in the database configs (for example, in MongoDB, open access is disabled by default and authorization is enabled, in Redis, requirepass + bind 127.0.0.1 is configured, in MySQL, an anonymous user is deleted and remote login for root is prohibited, etc.). In cloud services (such as AWS RDS) use parameter templates with Encryption at rest and enabled audit logs.
● Terraform and IaC policies: if you use the infrastructure as code, implement checks, for example, through the Open Policy Agent, which will not allow you to roll out changes that violate security rules. Example of a policy: "relational database must have publicly_accessive = false" for AWS RDS. Then, even by mistake, no one will open the base to the outside.
● Containers and orchestration: if the database is deployed in a container, do not forget about the security of the orchestration systems. In Kubernetes: configure the NetworkPolicy to limit which pods can access the database. In Docker Compose: do not set the database port to 0.0.0.0 unnecessarily, rather use a container network.
In operation (production)
● Anomaly monitoring: Set up alerts for suspicious activity in the database. For example, DLP systems can track massive data uploads. If someone reads 100,000 records outside of a typical scenario, a trigger will trigger. Many DBMS (Oracle, MS SQL) have built–in auditing tools - they should be enabled and logs should be analyzed regularly.
● Data Discovery and Classification: know what data is stored where and what sensitivity. Implement the practice of Data Stewardship – appoint those responsible for the datasets. Then, in the event of an incident, you will quickly understand what exactly has leaked and what the risks are. In addition, it will help to apply encryption where stricter measures are needed (for example, personal data vs. background information).
● Regular security audits: Scan your infrastructure for known problems at least once a quarter. You can use both third-party services (the same UpGuard, Detect, Shodan Monitor) and open-source utilities. The goal is to find open ports, protruding control panels, Elasticsearch indexes, and the like before intruders find them.
● Updates and patches: trivial, but critical – update your DBMS and software in a timely manner. Many leaks are the result of exploiting vulnerabilities for which there is already a patch. Example: the mentioned attack on MOVEit, which led to leaks in the public sector, occurred because a 0-day vulnerability was exploited. As soon as the patch was released, it was necessary to install it within hours. The smaller the vulnerability window, the lower the chance of a leak.
● Response plan: Be prepared for the worst. Having an Incident Response plan in case of a leak means reducing the damage. If a leak has occurred, it is important to quickly detect it (darknet monitoring for your company's mention, for example, will help), isolate the vulnerable segment, inform the affected persons and improve protection so that it does not happen again.
Conclusion
Remember that there are no small details in matters of information leakage. One open port or one compromised password can reset millions of security investments. Approach database protection as creatively as attackers approach their compromise. Constantly ask yourself the questions: "What happens if someone else's eyes see this file?", "Is there a way to extract the data in a roundabout way?". If such paranoia becomes part of the culture, the chances that your database will remain inaccessible to intruders increase significantly.
As a result, when you have done everything correctly, the database will remain a book under lock and key, and in order to read it, the attacker will have to go through a long quest from the defensive lines. And most will prefer to look for easier prey. Your data is worth it.