Knowledge is only real when shared.
Loading...
Knowledge is only real when shared.
December 20, 2021
Reasons why you probably don't need a monorepo.
A monorepo is a repository with multiple node packages managed together. The conventional split of a website into a Frontend and a Backend package doesn't fall under the category of a monorepo as it can easily be managed without additional tools and both parts are mostly self-contained. What we are looking at is splitting up what's traditionally a single package into multiple packages and whether this comes with any advantages.
Ever since npm packages became popular and every web application had a package.json
the idea of packages was there. Versioned packages are a way to provide others with reusable software. In an effort to decrease the size of popular bloated plugins their authors started factoring out some functionality into separate packages and publishing these independently. Often the functionality was still very much related to the main project and so it was put in the same repository. This resulted in a couple of problems. First, it was difficult to import a now separate package into another package or pull them together for integration testing. Secondly, there was no way to run scripts in all packages without doing it in each one separately. Thirdly, it was quite a chore to publish all packages separately as they were usually released all at once. Initially, so called Monorepo utility frameworks were aimed at tackling these issues and were also very successful at it. These problems only occur for a few very popular open source projects but the hype quickly took hold and these tools were also used in places where these problems don't exist.
Lerna created by Sebastian McKenzie in 2015 was the first monorepo manager library. The initial implementation consisted of roughly 500 Lines of Source Code. McKenzie was working on babel at the time and introduced it there to manage and publish all the different packages in the babel repository/ecosystem. Babel contains some additional documentation about why they are using this approach. As a logo he chose the hydra possibly foreshadowing the effects this package could have. The hydra in Greek mythology stands for something that you cannot get rid of.
Lerna will mostly install local packages and run scripts for you. That's the same thing as a package manager already does except for multiple packages. With that said, it wasn't a far stretch for yarn to hop on the bandwagon and provide the same features natively when lerna was trending. Yarn a package manager in the shadows of npm (preinstalled with node) was in need to provide features that would set it apart. It already had some features like offline installation and lock files, but Workspaces would become one of it's flagship features. Workspaces closely mimiced the approach taken by lerna. Eventually, all yarn features mentioned would also be adopted and integrated into npm itself. Since based on packages which are obviously at the core of package managers the monorepo or workspaces approach quickly made it into any package manager.
The monorepo approach to split up a huge open source project into several packages and to manage and publish them together with lerna or yarn workspaces is a great idea. As it turns out, this approach is also used used to split up applications that aren't published as packages to npm. This is where the problem starts. Self contained parts with a so called high cohesion are put into their own packages. Achieving this type of structure is great but can be done without converting the project into a monorepo.
Creating separate packages allows different developers to work more independently and with more freedoms. Builds and other configurations can be changed independently to manage the local package one is working on. While this gained freedom sounds great as we'll see it will lead to unnecessary chaos in the long term. Mostly because the tools allow for and encourage the use of different frameworks or different versions of frameworks in the same project. While handy these practices have long been recognized to be anti-patterns. For example using both React and Vue or having one package written in version 16 of React and another already migrated to version 17 are considered bad as they heavily increase the complexity of the application.
Linting a project that consists of just one package only requires you to initialize the linter once. A monorepo on the other hand will have to do the initialization of the linter once for every package therefore taking up more time and memory. If something can be split up into separate packages the tool can do the same itself and split up tasks on it's own running them in parallel. In the case of a linter as many files would be linted concurrently as there are CPU cores available. When we write tests we already split them up into suites that can be run in parallel. Once a build tool like webpack has calculated the dependency tree in theory it would be easy to split up the work and run it in parallel. Since JavaScript doesn't really support concurrency yet, it can only run scripts in parallel using child_process
. Monorepos use this to their advantage as at their core they are spawning up a new child_process
for each script meant to run. Testing tools like Jest have it similarly easy as a new environment has to be set up for every test to be run anyways. With linting and building the situation looks a bit different and it's more likely that in the future Rust based tooling that leverages native concurrency will succeed and increase performance anyways.
All of the plugins we use daily have been developed to work and scale with a traditional approach of just one package locally. When a large application is built the bundler automatically splits it up into parts and caches them independently avoiding costly rebuilds. By breaking parts up into separate packages this caching mechanism no longer works therefore increases build times. turbo which is another monorepo tool tries to avoid this problem created by the monorepo structure through adding it's own highly elaborate caching mechanisms. By nature monorepo frameworks have to work independently of the context of the tools and therefore are unlikely to achieve a similar degree of optimization.
In the previous post an approach is described where source code is placed in the root of the repository for easy access. Monorepo tools usually require the source code to reside nested even deeper than the usual /src
folder inside something like /packages/some-part
or /apps/web
.
When trying to pick apart the idea behind monorepos it's a good idea to look at the takeaways the autor took from the experience. McKenzie the author of lerna apparently got fed up with the problems that arose when the concept was applied to the extreme. In a twist of fate he's now doing the exact opposite to avoid all the problems described. His newest project (even company) is called Rome Tools and is a tool that offers the full frontend build toolchain. However, instead of multiple packages the idea is to provide all of the necessary functionality in one seemlessly integrated package. Just like lerna this is pretty much a totally new approach and basically the extreme opposite of the previous idea. All other similar existing tools usually rely on existing specialized packages to take care of a specific task. Examples would be webpack or esbuild for the build, eslint for linting, babel for transpiling and prettier for formatting. Since tasks like linting or transpiling have almost nothing in common it's very effective to develop these independently and then pull them together in a build tool that's easy to use for the programmer.
The difficult problems in programming are often optimization problems. The optimum lies somewhere in the middle. This optimum is hard to find and also depends on various external factors related to the people involved. McKenzie has managed to go to both extremes arguably quite far away from the optimum. The easiest route to take lies somewhere in the middle balancing out the two approaches of splitting up and integrating things.
npm based projects are notorious for having tons of scripts inside their package.json
. Each tool requires their own script. Build tools like create-react-app are already running several tools with just a script (e.g. linting and type checking also happen during build). A continuous integration environment often runs several of these scripts in a specific order. Some of these monorepo tools are really good at orchestrating running these scripts in so called pipelines. Although not directly related to a monorepo this functionality has been missing from package managers and was left to build tools or the developers themselves. Users can now locally run the full pipline of scripts in parallel. The same command can be triggered in a continuous integration environment where the workflow doesn't have to be specified again.
To make it more practical let's look at the most sophisticated example turborepo is offering the so called kitchen sink starter. It consists of the following packages:
api
: an Express server
storefront
: a Next.js app
admin
: a Vite single page app
blog
: a Remix blog
logger
: isomorphic logger (a small wrapper around console.log)
ui
: a dummy React UI library (which contains a single <CounterButton\>
component)
scripts
: Jest and eslint configurations
tsconfig
: tsconfig.json;s used throughout the monorepo
Obviously, we're not going to get our hands dirty with any monorepo tools. Our goal is just to figure out how a regular setup doing the same thing would look like. Although, nothing's wrong with Remix, we will use the more popular Next.js at the root of the project and integrate the blog at the root route. The storefront which is already a Next.js app can be directly taken over and placed under the /pages/storefront
folder/route in the same Next.js app. Luckily, Next.js already has an express server built in and we place the handler inside /pages/api/message/[name].ts
. Next, the admin app is relocated to the /admin
route still in the same Next.js app. With this step we get server side rendering for free but might have to port over an Angular or Vue app to React. Again, this would not be an issue if the decision to use Next.js was made at the beginning. The React UI components can easily be shared between all these apps by placing them in a /ui
folder in the root of the Next.js app. Since the logger is isomorphic we can place it in a /helper
folder and use it for the API, during server-side rendering as well as on the client. As a tsconfig.json
we'll stick with the Next.js default and modify it if necessary. Likely, the blog, storefront and admin panel will have the same browser support requirements and if not we'll stick with the lowests common denominator. For the eslint configuration found in the scripts package we don't need to do anything as Next.js already comes preinstalled with eslint dependencies and a configuration that can be extended when necessary. For testing we're more likely to use Cypress or Playwright to see if the whole process works well, but it's also possible to integrate Jest and run tests on any parts of the application. It doesn't really matter as the example doesn't include any tests.
Using the above described strategy you'll get about a 5 times reduction in both files and lines of code. This example might be a bit a bit more off than the monorepos you'll encounter in reality. Since there's now just one framework much of the overhead is gone as well and performance will increase by about a factor of 5 as well. The whole promise of monorepos to avoid complexity and increase performance is simply not tangible in any way. Although, differences will probably never be this glaring, one can always expect these kinds of improvements with a more traditional architecture. This of course assumes the use of different frameworks like react and vue together, as well as different build tools in the same project like webpack in one package and rollup in another. For server side rendering the monorepo might include next as well as remix.
Monorepos seem great in the short term but will inevitably lead to a lot of pain over the long term. As it indirectly encourages the use of multiple frameworks that in effect do the same thing, similarly every package might turn out to have different formatting and linting rules. Without a monorepo of courser there is additional upfront effort in synchronization as decisions have to be taken by the team but this will pay back in the end. At the beginning a rather sawy developer might split a traditional project into a monorepo simply for the challenge or to ensure they'll be able to always use a stack they deem important. As new members arrive on the project they can implement their features using the frameworks they already know and are happy as well. The person acquainted with the whole monorepo setup and all the frameworks used might leave at some point. Now a situation might arrive where there are parts of the application that need to be maintained but nobody is familiar with the frameworks used or just cannot handle making the mental switch quickly into code written in a different way. At this point you have successfully summoned the mythical hydra pictured in the lerna logo. No matter the effort there is no way back to a maintainable application other than a complete rewrite without all the different configurations and frameworks.
Monorepos avoid the need for architectural decisions to be made before or during development. Every developer or team can just set-up their own package and use whatever they deem useful.
One could argue that the so called microservices backend architecture describes something similar to a monorepo in the frontend. However, the underlying hardware is different in the case of microservices on a server. Microservices can take advantage of the easy scalability which todays cloud infrastructure providers offer. If demand is large more services can be spun up and run on additional hardware. This provides a simple way to achieve scale without much complexity on the programmer side where the abstraction takes care of the issue. Due to the different circumstances in the client side of web applications there is no way to run multiple instances in parallel since JavaScript itself is single threaded. Running multiple instances in sequence incurs massive performance penalties. Most frameworks already provide concurrent features for asynchronous tasks like data fetching.
The idea behind Module Federation introduced recently with webpack 5 is somewhat similar to a monorepo approach. Instead of running multiple webpack instances to generate multiple builds one webpack instance will be used to create multiple separate builds using one instance. Inspired by Microservices it's called Micro-Frontends. Similar to Microservices where different parts of the application can be served by different servers the webpage can load different parts from independent servers as well. Like Monorepos this initially sounds great but in the end will usually not solve any practical problems while still adding unnecessary overhead and complexity.