Join us
@bridgecrewio ・ Jan 27,2022 ・ 6 min read ・ 1582 views ・ Originally posted on bridgecrew.io
Monorepos—or the use of a single repository for every part of an application—have been around since before git was invented in 2005.
This is in contrast with the more recent approach of having separate repositories for each service and the underlying infrastructure. In recent years, however, they’ve come back in vogue thanks to top engineering organizations such as Google, Facebook, and Uber all publicly stating that they use monorepos, and again in 2017 when Microsoft said they had moved Windows to a monorepo.
Bridgecrew is also part of that list! 😉 We use a monorepo to manage all of our 100+ microservices. We didn’t make that decision lightly. We had a long and animated (you know how opinionated devs can be) debate about whether to break up our repos or to leverage a monorepo.
We recognize that a monorepo is not a fit for every organization. If you are evaluating the use of a monorepo, there are pros and cons that we evaluated, and you should, too.
The primary concern with using a monorepo is access control. When using git for a monorepo, you can block write access by directory, but git was not designed to limit read access granularly like that. There are ways to create branches with branch protections, but even with those controls, an insider may still be able to gain access to more source code than if the codebase was broken up. Additionally, supply chain attacks can be amplified if proper protections aren’t put in place. A full admin (not recommended) has control over the entire codebase and could poison any part of it. That’s why using a monorepo requires a super high level of trust in your team and your security practices.
The other primary reasons teams avoid using a monorepo are beginning to be solved. For example, people sometimes believe that microservices require multiple repos and only monoliths can use a monorepo. If teams are decoupled with separate repos and pushing code at different paces, it is undoubtedly easier to tune your repositories for each team. Within a monorepo, however, there are ways to achieve the same partial deployments. Tools that compare the changes to running state and only apply diffs, such as Terraform and Kubernetes, can perform rolling deployments where the changes can be isolated to a small segment of the code and can be deployed in place without downtime.
Additionally, teams may be afraid of the storage and networking necessary to handle development and pipelines for a monorepo. Google has an estimated 80TB in its monorepo. Obviously, that won’t fit on a developer’s laptop, which is why git added shallow and partial clones. Those may still be large, but you only pay the cost once, then all subsequent pulls are diffs and, thus, much smaller.
Finally, many testing tools used in CI/CD pipelines are designed to test the entire repo. That would take hours unless configured properly. And that is not a simple task, but that is where the tips in the next section can help.
The primary benefit of a monorepo is to have one single source of truth. I’ve heard horror stories of people making changes to their microservice with the expectation that it was deployed, but because they missed a step in a dependency’s repo, it wasn’t. With a monorepo, you know that the running application environment matches the main branch because there is no other codebase to apply from. Therefore, if the code isn’t committed to the main branch, it isn’t in the runtime environment.
Additionally, local searching, shortcuts, and code reuse benefit from having a single repo. Search is a lot easier because your code and all dependencies are in the same repo. If multiple services share a dependency, you can download or create that package once and use it anywhere.
Finally, with the right configurations, even testing such as security scans can be configured to handle a monorepo.
In order to properly manage a monorepo, you should organize the various components of your codebase to be separated in a logical directory structure. Related code should be close in the directory tree, and work that involves different teams should be separated into different directories. If properly structured, the organization of a repo will also help with managing protections and auditing. Write and approval permissions can be mapped to specific directories, maintaining a least privilege model for administration where write access is limited to only specific directories despite a shared repository across teams.
Traditional testing tools provide feedback across an entire repository. If you are only working on modifying the code in one small microservice out of hundreds or thousands in your monorepo, any feedback on other services is just noise. Feedback, such as automated code comments, should only be provided for the resources or blocks of code that have been modified. For example, if you open up a pull request that modifies a single resource block, you shouldn’t be receiving feedback for other resource blocks in that file or, even worse, in a separate directory unless directly impacted by your code changes.
In the next phase of testing, during the CI/CD phase, when you build the code for further testing, focus those builds on the minimum necessary for testing. Focus on just the pieces of code and their dependencies that are being modified. For compiled languages, directed graph systems like Buck (Facebook) and Bazel (Google) help by only focusing builds and testing on the code under construction in a build system and other optimizations like parallelizing dependency gathering and caching artifacts. For microservices, you shouldn’t need to redeploy the entire application and all of the services for testing. Focus initial testing by building just the modified container, testing that, then if it passes, move to a staging environment for more integration testing that does require dependent microservices to be included.
A monorepo can get massively large, making it challenging to find relevant code. Leveraging metadata like git blame will help filter code to the parts you or your teammates modified. Additionally, automated tagging makes tracing from runtime environments back to the relevant code easier. In the event that a vulnerability or misconfiguration was found in runtime, that tagging is an invaluable time-saving tool.
Strictly speaking, this isn’t a tuning recommendation, but it is important. Trust is still crucial to being successful with monorepos. As mentioned before, git doesn’t have a good way to block read access. Therefore, you’ll have to trust your team that they won’t steal your trade secrets. Additionally, monitor and audit behavior to watch for insider threats.
Relying on a monorepo is not for everyone, but for some, the benefit of having a single source of truth outweighs the costs. Oftentimes what it comes down to (as it did for us at Bridgecrew) is whether the time saved from better searchability and easier traceability over time is worth having to deal with the upfront complexities.
Because of this and because many of our customers also leverage monorepos, Bridgecrew comes out of the box tuned to handle multiple repos and monorepos alike without causing a massive increase in testing time or causing noisy feedback to developers. The five recommendations above can still benefit organizations with multiple repos, but the impact is amplified for monorepo users.
Monorepos are back in popularity, at least they appear to be, but who knows. Maybe in a few years, we’ll be discussing the hot new “microrepo” trend. Regardless, your security testing should be tuned to operate with your specific environment, and Bridgecrew is here to help either way.
Join other developers and claim your FAUN account now!
Influence
Total Hits
Posts
Only registered users can post comments. Please, login or signup.