A Data Leak Detection Guide for the Tech Industry in 2023

In February 2021, UpGuard researchers discovered that 51% of analyzed Fortune 500 companies were leaking information in the metadata of public documents hosted on their websites. This discovery is a window into a broader overlooked cyber threat category, increasing the risk of data breaches in the tech industry – data leaks.

Data leaks (often confused with data breaches) help hackers compress the data breach attack pathway, increasing the speed, severity, and frequency of these events.

To learn some of the common causes of data leaks in the tech sector and how to address them, read on.

Six Common Types of Data Leaks in the Technology Industry

1. Human Error

Since data leaks are caused by overlooked exposures, technically, each event in this list sits within the broader risk category of human error. At a high level, some examples of data leaks with a distinct human error attribution include:

  • Loss of work-related hardware – The loss of hardware storing confidential information, such as laptops and external hard drives, could lead to unauthorized data exposure, especially if these devices are secured with weak passwords.
  • Insecure data handling practices – Displaying internal passwords on post-it notes and accidentally sending links to sensitive information to unauthorized users. Also includes unencrypted sharing of extremely sensitive customer details, like bank account information, credit card numbers, etc.

2. Misconfigured Cloud Storage Services

A cloud storage misconfiguration is an overlooked error in the setup of a cloud service that leads to unauthorized exposure of highly sensitive data, which could include personal data like social security numbers, financial data, and personally identifiable information (PII). This threat has long been recognized as a critical security risk by its common inclusion in the top 10 list of vulnerabilities in the Open Web Applications Security Project (OWASP).

These exposures are not caused by security vulnerabilities but rather by human error. The detrimental consequences of specific configuration settings are often not realized until these systems are connected to the internet and tested in the wild.

While an exposure causing a data leak could be classified as a vulnerability, it’s technically incorrect to conflate the two events. The process of exploiting a software vulnerability is completely distinct from public exposures of sensitive information.

Many prestigious businesses have fallen victim to data leakage resulting from such a seemingly amateur oversight.

  • Microsoft – In October 2022, Microsoft overlooked a misconfiguration causing an open cloud endpoint potentially exposing sensitive customer data, including email addresses and phone numbers.
  • Amazon – Also, in October 2022, a misconfigured Amazon server exposed data revealing the viewing habits of Amazon Prime members.
  • Thomson Reuters – In November 2022, Thomas Reuters revealed that misconfigurations in three of its servers resulted in sensitive customer data, including third-party server passwords in plaintext format, potentially accessible to anyone crawling the exposure.

The potential large-scale impact of the Thomas Reuters misconfiguration highlights the significant danger of these events. Had this misconfiguration remained unaddressed, cybercriminals could have used the exposed passwords to access systems utilized by businesses working with Thomas Reuters, establishing the necessary foothold for a supply chain attack.

Learn more about supply chain attacks >

Examples of Data Leaks in Popular Cloud Storage Services

Every cloud service is vulnerable to data leak-inducing misconfigurations. Some examples of such events for popular cloud solutions are outlined below.

Azure Storage Blob

Setting the public access level for an Azure Storage Blob to “Container” or “Blob” allows anyone with the URL to access the contents of the Blob or Container without authentication, creating a potentially exploitable pathway to any stored sensitive data. To prevent this, always set access levels to “Private” and manage all data access with Shared Access Signatures (SAS) and Azure Active Directory.

Google Cloud Storage

Setting an object’s Access Control List (ACL) to “public-read” allows anyone with the URL to the object to access its contents. Setting the ACL to “public-read-write” offers the additional privilege of modifying the contents of an object. If such a URL is exposed to the public, sensitive data stored inside an object is vulnerable to compromise.

To prevent such a data leak, always set the ACL to “Private” and manage object access with Google Cloud Identity and Access Management (IAM) policies. Besides ensuring only authorized users have access to sensitive information stored in Google Cloud Storage, an IAM allows you to control the level of authorized access to each specific object.

3. Misconfigured Software

Just like cloud storage services, cloud software is also highly vulnerable to misconfigurations leading to data leakage. The most popular example of this risk is the Microsoft Power Apps data leak of 2021. UpGuard researchers discovered that Microsoft Powerapps had an overlooked exposure to a private database via a poorly configured API – a data leak exposing 38 million sensitive records to the public.

Learn how UpGuard detected this data leak >

4. Misconfigured Home Network Services

Some examples of misconfigured office network services that could result in data leaks include.

FTP

File Transfer Protocol (FTP) is a commonly used protocol for transferring large files between remote computers and servers over a network. Many remote setups use FTP as a backup service which could include involving sensitive company information.

When an FTP is misconfigured, any sensitive data stored on the computer associated with the protocol is accessible to unauthorized users.

An example of misconfigurations that can lead to a vulnerable FTP service is not disabling anonymous access. This could allow anyone to access an FTP service without authentication, potentially exposing sensitive data to unauthorized users.

RSync

RSync allows Unix and Linux-like systems to transfer files between local and remote systems. When an Rsync service is misconfigured, it’s vulnerable to unauthorized access to any sensitive data stored on a remote endpoint.

Examples of misconfigurations that can lead to a vulnerable Rsync service include:

  1. Not using firewalls to restrict access to the Rsync service: This can allow unauthorized users to access the Rsync service and transfer sensitive data.
  2. Not limiting the number of concurrent connections: This can allow an attacker to overload the Rsync service and cause a denial-of-service (DoS) attack.

    Learn more about DDoS attacks >

  3. Not using encrypted connections: This can allow an attacker to intercept sensitive data being transferred via the Rsync service.

    Learn more about encryption >

  4. Not changing the default port used by the Rsync service: This can make it easier for attackers to find and exploit vulnerabilities in the Rsync service.
  5. Not using strong passwords for Rsync accounts: This establishes a critical vulnerability to brute force attacks, which establishes a potential pathway to any sensitive data stored on an Rsync server or user endpoint.
  6. Not disabling anonymous access to the Rsync service: This can allow anyone to access the Rsync service without authentication, potentially exposing sensitive data to unauthorized users.
  7. Not setting proper permissions for the Rsync service: This can allow unauthorized users to access sensitive data stored on the Rsync server or to modify the data being transferred via the Rsync service.

Git Services

A misconfigured GIt Service creates a series of vulnerabilities offering hackers a smorgasbord of potential cyberattacks to choose from, including:

  1. Data breaches: A misconfigured Git service creates a potential pathway to any sensitive data stored on a Git service, such as source codes and passwords. The ease with which this data can be accessed through a misconfiguration classifies this threat as a data leak.

    Learn how to prevent data breaches >

  1. Unauthorized code modifications: Attackers can modify the code stored within the Git service, potentially leading to security vulnerabilities or malicious code being introduced into the codebase.
  2. Remote code execution: Attackers can execute code on the Git service, potentially compromising the security of the system and allowing them to access sensitive data stored on the service.
  3. Access control bypass: Unauthorized users can bypass access control mechanisms and gain access to the Git service, potentially leading to data breaches and other security incidents.

    Learn more about access control >

  4. Information theft: Attackers can steal sensitive information, such as source code, passwords, and personal information, stored within the Git service.
  5. Repository spoofing: Attackers can create fake repositories that mimic legitimate ones, potentially tricking users into downloading malicious code or exposing sensitive data.

Examples of misconfigurations that can lead to a vulnerable Git service include:

  1. Not using secure transport protocols: A Git service should be configured to use secure protocols, such as HTTPS or SSH, to prevent malicious interception of data in transit.
  2. Not setting up access control properly: Insufficient user access control make Git services highly vulnerable to unauthorized access.
  3. Misconfigured firewall: While a different solution to a Git service, a misconfigured firewall provides a pathway for exploiting a misconfigured Git service. Contrarily, a properly configured firewall could prevent sensitive data exposure from misconfigured Git services.

GitHub

GitHub, the most popular code hosting platform for developers, software engineers, and even cybersecurity experts, is commonly a source of data leaks resulting from misconfigurations – either within the GitHub product or its integrated services.

Some examples of events leading to Git Hub-related data leaks include:

  1. Accidentally committing sensitive files (such as passwords, API keys, or secret keys) to public repositories.
  2. Improperly configured Gitignore files, causing sensitive information to be tracked and pushed to the repository.
  3. Leaked access tokens or secrets stored in environment variables.
  4. Hardcoded credentials in source codes.
  5. Unsecured Google Cloud Storage or Amazon Web Services (AWS) buckets linked to a GitHub repository.
  6. Exposed sensitive data in code comments and pull request descriptions.
  7. Unsecured use of Git LFS (Large File Storage) leads to the exposure of sensitive data in large binary files.

5. Publishing of Sensitive Data Stolen in a Data Breach

When sensitive data and intellectual property stolen in cyberattacks are published on the dark web, these events are classified as data leaks. A data leak is usually the final stage of the attack lifecycle. Following a successful breach, hackers either freely post stolen data on dark web forums – as an extortion tactic in a ransomware attack – or publish it for sale in a cybercriminal marketplace.

Given the high worth of sensitive data in a cybercriminal economy, it’s safe to assume that all breach data will eventually be leaked on the dark web.

6. Data Leaks from Third-Party Vendor Vulnerabilities

The scope of data leaks extends beyond your IT borders and into your entire third-party vendor network. Because organizations and their third-party providers are now more connected than ever, each vendor is a potential attack vector to your sensitive data if they are vulnerable to data leaks.

Vendor-related data leaks are caused by the following:

  1. Insecure transmission: If the transfer of sensitive data is not encrypted or secured, it can be intercepted and exposed during transmission.
  2. Lack of authentication: If the third-party service used to transfer sensitive data does not have proper authentication mechanisms in place, it’s vulnerable to unauthorized access.
  3. Unsecured storage: If the data is stored in an unsecured third-party storage solution, it can be accessed by unauthorized parties or exposed in a data breach.
  4. Insufficient access controls: If the third-party service used to transfer sensitive data does not have proper access controls in place, it can lead to data exposure through unauthorized access or modification.
  5. Third-party breaches: If a third party suffers a security breach and their compromised data is published on the dark web, malicious actors could use this information to breach the third party and any of its clients, which could include your business.

Five Ways Tech Companies Can Detect Data Leaks in 2023

An effective strategy for detecting data leaks must be multifaceted to account for the limitations of each individual solution. A suggested approach is compromised of four components:

1. Attack Surface Scanning

Scanning all internet-connected devices in your ecosystem for security vulnerabilities will uncover potential data leaks these events create. For example, a scanning solution like Shodan can discover publically accessible servers vulnerable to compromise through reported exposures.

A more scalable alternative not requiring manual management is an automated attack surface scanning solution with real-time vendor security posture tracking. Such a combination allows third-party vendors with failing security performance to be readily assessed for security risks potentially leading to your exposed data.

Learn more about attack surface management >

2. Penetration Testing

Most misconfiguration causing data leaks are difficult to detect with scanning solutions alone. For example, leaky storage buckets exposing sensitive data to the public are not discoverable with attack scanning methods. These hidden regions of the attack surface are best discovered through penetration testing.

A regular penetration testing schedule could help you discover and address hidden exposures before they’re exploited by hackers. A thorough pen test could have potentially discovered the unsecured API that led to Optus’ enormous data breach in 2022.

3. Security Assessments

Security assessments (also known as security questionnaires) could reveal internal and third-party vulnerabilities linked to data leaks by analyzing your threat landscape against popular cybersecurity standards.

Since data leaks could originate from a broad range of vulnerabilities, an ideal security assessment should force an organization to consider each aspect of its security posture by asking questions about:

  • Assets
  • Access Controls
  • Data Storage
  • Data Security
  • Network Security
  • Third-party Security
  • Incident Response Plans
  • Compliance with Relevant Laws and Regulations

An example of a security questionnaire covering such a broad range of controls is the CyberRisk Questionnaire available on the UpGuard platform.

4. Continuous Scanning of Common Data Leak Hosts on the Dark Web

A data leak detection solution is one of the best security measures for preventing ongoing compromise following a data breach event. Such a solution continuously scans common data leak hosts on the dark web, including ransomware blogs which serve as placards for increasing portions of stolen data during the extortion phase of a ransomware attack.

A data leak detection solution alone, however, could become more of an administrative headache than a valuable security control. This is because entirely automated data leak software often fails to consider the broader context of a leak, leading to false positive notifications. An Ideal data leak detection program should be a combination of an automated component – to ensure complete coverage of common data leak hosts – with a human component – to filter out false positives based on an expert understanding of each data leak context.

Learn how to reduce false positives in data leak detection >

5. Security Awareness Training

With the exception of insider threats, which are a rarity, employees are not purposefully choosing behaviors that expose sensitive company information. The good news is that because data leaks caused by human error are not motivated by malicious intentions, they can be easily addressed with cybersecurity awareness training. Not only is this one of the easiest and most impactful methods of increasing your data breach resilience, it will also make the lives of your security teams much easier!

Human error is the primary factor of most successful data breaches. If you can teach your staff how to correctly identify cyber threats, you could protect your business from a majority of potential data breach events.

Examples of poor employee habits that lead to data leaks include:

  • Reusing the same password for multiple accounts, including sensitive internal accounts.
  • Failing to secure personal devices, such as laptops and smartphones, that contain internal data.
  • Sending sensitive emails to the wrong recipient.
  • Failing to dispose of physical documents containing sensitive information securely.
  • Using public Wi-Fi networks to access and transmit sensitive data.
  • Not using encryption to protect sensitive data while it is being transmitted.
  • Not logging out of sensitive internal systems after use.
  • Not regularly updating security software and systems to protect against new threats.
  • Not performing regular security audits to detect and remediate vulnerabilities in internal systems.
  • Leaving sensitive information visible on screens in public areas.
  • Not limiting access to sensitive data to only those who need it for their job responsibilities.

A cybersecurity training program focused on mitigating the causes of data leaks should cover the following essential topics. Many of the listed items are supported with free resources that can be used for training content inspiration.

  1. Understanding sensitive data: What constitutes sensitive data and why it’s important to protect it.

    Learn about data security >

  2. Password security: Best practices for creating and managing strong passwords, including password length, complexity, and the use of password managers.

    Learn about best password practices >

  3. Social engineering: Understanding how attackers use social engineering techniques to trick individuals into revealing sensitive information. This module should cover phone scams and how malicious actors use social media to commit cybercrime.

    Learn about social engineering >

  4. Phishing attacks: How to recognize and avoid phishing attacks, including fake emails, websites, and phone calls.

    Learn about phishing >

  5. Email security: How to secure email communications with encryption and digital signatures.

    Learn email security best practices >

  6. Mobile device security: Best practices for securing mobile devices, including smartphone and tablet security, and the dangers of mobile malware.

  7. Physical security: Best practices for protecting sensitive data from theft or loss. Should cover the use of encrypted drives and how to securely dispose of physical devices storing sensitive data.

    Learn how to secure USB flash drives >

  8. Cloud security: Understanding the security risks associated with using cloud-based services and how to secure cloud-based data properly.

    Learn more about common cloud misconfigurations >

  9. Remote access security: Best practices for securing remote access to internal networks and systems, including the use of virtual private networks (VPNs) and multi-factor authentication (MFA).

    Learn how hackers can bypass MFA >

  10. Incident response: How to recognize and respond to a data breach or other security incident, including the importance of having a well-documented incident response plan.

    Learn more about incidence response plans >

  11. Identity theft: Understanding the risk of identity theft, identity breaches, and how they could lead to unauthorized systems access leading to sensitive data compromise.

    Learn how to avoid falling victim to identity theft >

  12. Secure coding practices: This module should address secure coding practices preventing leaks on Github and the dangers of hardcoding credentials in source codes – the impact of a data breach could be significantly lessened if internal credentials cannot be discovered through compromised source codes.

For more data leak detection and prevention guidance, refer to the following resources:

Data Leak Detection and Prevention with UpGuard

UpGuard’s data leak detection solution helps tech companies rapidly detect and shut down leaks across common hosts on the dark web, including ransomware blogs. With the addition of cybersecurity experts contextualizing each discovery to remove false positives, UpGuard empowers the technology industry with an accurate, efficient, and scalable data leak prevention program to complement existing cybersecurity efforts.