Leaky Secrets in Git - Instrumentation and Response

ThetaPoint Security Reference Architecture - SRA

Leaky Secrets in Git – Instrumentation and Response

May 29, 2019

How Bad Can it git?

A recent research paper described a rigorous empirical study on the rate at which secrets (cryptographic keys, API credentials, etc.) are inadvertently leaked through SCM (source code management) tools to GitHub. The numbers they found were alarming. The research team identified hundreds of thousands of secrets in the public Github BigQuery dataset using simple search techniques. Additionally, they identified thousands of secrets per day using automated searches against the Github API.

How Does this Happen?

Mistaken Beliefs and Improper Usage

Developers, operators, and administrators can often misjudge the level of exposure that SCM systems can have. They can conclude that environmental access controls on the SCM or permissions-based access controls can make the SCM “safe” for secret storage. This misperception bears resemblance to users who store plaintext passwords on internal file shares in the belief that they are “protected” by access control. The latter is a clear violation of security practices and norms, but by extension, covers the former as well.

Difficulty of Use

Git has a standard path-based mechanism to exclude files from version control called .gitignore. If you can define a glob-style pattern to match on a file name, extension, or directory, then you can make sure it will not get uploaded to your SCM of choice. However, you must have a standard nomenclature or practice for storing credentials to use this feature. Secrets can show up in some standard paths (like configuration files), but developers also commonly embed them directly in source code at the point of use. This makes path-based exclusion rules difficult to rely on as a first line of defense.

Accidents Happen

Git’s complexity creates ample opportunities for mistakes. Some of the most comprehensive documentation that you can find on its usage relates to undoing the most common errors rather than how to make errors less likely. In fact, Meli et al. found no appreciable significance correlating levels of user experience or frequency of use to the likelihood of a secret being leaked. This provides pretty strong evidence that random chance contributes strongly to the base rate of secret leakage, and that you can expect your risks to compound across repositories.

What Can I Do?

The Normal Tools may not Work

Source Code Management systems can pose complications for the traditional security monitoring toolkit. Git alone can use four different network transports, each with several implementation-specific variants and optional encryption. Furthermore, git repos use an application-specific file format to maintain versioned object data that may not be understood by file scanners. When combined, these application semantics make signature-based detection tools unreliable at best.

Get Inside the Workflow

SCMs are fundamentally event-driven tools, so it makes sense to harness those semantics to generate security events for your SOC. Git provides “hooks” for you to run code when a user enters a certain step of the workflow. The most important ones for security are “pre-commit” on the client side and “pre-receive” on the server side. Both hooks allow you to inspect the state of the repository before entering the next workflow step. pre-commit runs on the client side before finalizing a transaction and lets you reject the transaction if certain conditions are met. On the other hand, pre-receive runs on the server side and allows you to prevent committed transactions from being uploaded and stored on the SCM itself unless certain conditions are met.

Get Inside the Source Code

You can also assert visibility by inspecting source code that is uploaded to your SCM. You can do this out-of-band using a filesystem scanner, but since most SCMs use application-specific storage formats, you run the risk of false negatives if your scanning tools don’t understand that format. Another high-fidelity method is to intercept all written file data and inspect it. This process can incur both write latency and extra compute cost unless properly configured.

Contact Us

Our research team has presented extensively on building instrumentation and incident response to manage risks in source code management systems. If you would like to receive these materials (including sample configurations and code), enter your details here and we will send them to you.