Upstream downstream relationships

Effective upstream-downstream relationship and Blink case study

2024-09-23 talk

This is an adapted version of the talk I gave at ForwardJS Ottawa on 2024-09-19.

I hope this post will encourage you to become a good upstream maintainer and/or a great downstream user and contributor.

The important thing is to build, don’t stop building.

I tried to condense this post as much as I could so I left out many details.

I joined the Google Chrome team in 2007 when it was not even 20 people yet, first on security then infrastructure up to 2019.

If you ever used a Google Home smart display, then you used the Fuchsia Operating System. If you used any Pixel branded device, then you used the Pigweed firmware framework. I managed both EngProd teams until last year.

This year, I built the Windows infrastructure for the web browser Arc, which redefines what browsing means.

I’m a part time hardware hacker, part time angel investor, but I primarily see myself as a software engineer, and I don’t like repetitive work.

For example, I didn’t like raising and lowering the monitors at my desk, you know when you change posture and you need to adjust the monitors?

So I wrote a Chrome extension to detect when I lock and unlock my workstation to notify Home Assistant which then talks to an ESP8266 microcontroller running esphome.io to raise or lower the monitors as needed. I don’t ever have to think about having the most ergonomic setup, it just works.

And yes, this is my workstation. MWE Lab from Québec City makes it. Highly recommended!

I own the whole software and hardware stack, my project is the source of truth. I made it. It’s mine. I soldered the board, wired up everything. The software is on GitHub.

Simplified dependency tree of my project.

My project depends on a lot of software created by others, software and hardware engineers who built the awesome reusable components. I’m over simplifying the stack here, my point is that I can use python3 as is, I don’t need to modify any of these projects to maintain mine.

When I write software, I don’t patch my CPU. Generally I don’t patch my OS either. I buy off-the-shelf laptops or servers and that’s it.

It’s not true of hyperscalers. For example Google has been using custom tailored CPU SKUs created exclusively for them for more than a decade. The CPU is customized to benefit the hyperscaler. The modifications enable their secret sauce in domains like security and observability.

What is large scale?

It’s a matter of scale. The larger the software project, or the hardware deployment, the higher the likelihood that they need to tweak their dependencies.

I’d like first to define what a large project is. When people talk about the largest open source project, the Linux kernel is often mentioned.

The Linux kernel main repository linux.git is a pretty active project with 1600 commits per week. The Linux kernel uses the git merge workflow.

Let’s see what happens when we only look at merges.

Commits per week in linux.git and merges only

Since the Linux kernel uses a merge workflow, merge commits are the equivalent of a PR. If you count only merge commits, the contribution rate is about 130 per week. It’s the red line in this graph. It’s about 20 changes per day, that’s a lot.

Let’s compare the WebKit and Blink projects.

Commits per week in WebKit.git and Blink

The two lines match up to the fork in 2013. Since they both come from a subversion workflow, they both migrated to a rebase workflow. We can see that on average blink contributions have been a tad higher. It’s the blue line. Still both are within similar ranges, about 300 contributions per week, or 40 per day.

Note that I didn’t try to filter out automated commits so human generated changes are likely a bit lower than that. I know that in the past folks were running scripts at the desk doing automated commits on their behalf. I hope it’s not the case anymore.

Let’s compare with Chromium’s main repository src.git.

The Chromium project uses a lot of bots doing automated commits. I tried to remove the obvious ones which make it hover at around 2300 changes per week, or 330 per day, or a commit every 4~5 minutes, 24h a day, 7 days a week.

Dips in the graph are New Year’s Eve.

It’s what I call a large project.

Graphs above were created with github.com/maruel/talk-upstream-downstream.

Big software projects like Chromium depend on a lot of other projects. And just like my monitor project, they reuse existing software.

Chromium lists them as required by the software licenses. You may notice that some projects are listed multiple times because often these are transitive dependencies, they are dependencies from dependencies that Chromium needs.

You don’t want to reinvent the world. Yet, you may get into a state where you need to make patches to the dependencies. That’s when people start forking.

Video recording of chrome://credits

There’s multiple reasons to fork. Sometimes it’s due to licensing or governance. Examples include MariaDB from MySQL when Oracle bought it, or Jenkins from Hudson when … Oracle bought it, or LibreOffice from OpenOffice when … Oracle bought it. You get the idea. 👿

More recent examples include ElasticSearch, Redis, MongoDB, CockroachDB, Grafana and Terraform among others. So as you figured out, we’re going to talk about forks but also how to not fork.

Sometimes, fork happens because the authors are not ready to make the project public yet. It happened for both Apple’s Safari and Google Chrome. The history traces back to more than 25 years ago, when Internet Explorer 4 just overtook Netscape as the most popular web browser. The history here is fascinating.

🧂

Before I start, here’s a disclaimer. While I tried to make this talk grounded in facts as much as possible, I am not omniscient nor free of biases. This is not a Google Good, Apple Bad talk. While I was employed by Google, I don’t believe there was any ill intent on any side.

KDE is a Linux desktop suite. Quickly the project realized they needed a web browser and built KHTML and KJS.

The funny thing is that KHTML was itself a fork of khtmlw, which was created in 1996!

In 2001, Apple silently forked KHTML to develop WebKit.

In 2003, Apple made it public as part of the initial Safari release, as required by the KHTML license.

They initially declared their desire to collaborate and contribute back but it was clear that WebKit would never be merged back into KHTML and that WebKit would be its own source of truth, driven by business imperative. It’s an important distinction because KDE is free software, in the GNU sense, while WebKit is only open source because it’s contaminated by KHTML’s LGPLv2 license.

In 2006, the Browzr project created by ex-Mozilla contributors, employed by Google started using WebKit. The difference with Apple was that they started contributing patches back using pseudonyms. There was no intention of keeping it forked and when Chromium was open sourced in 2008, all the remaining local patches were upstreamed or addressed inside the Chromium project itself.

Google asked for a lot of structural changes in WebKit. They wanted to use their own javascript engine called V8, instead of JavascriptCore. V8 was far from being mature at the time. Google asked to fundamentally change the threading model in a way that was not needed for single process web browsers.

This created a lot of churn in WebKit, requiring the addition of new abstraction layers. Apple really showed willingness to embrace collaboration. It lasted 5 years.

A selection of critical events:

1991 🐣 Linux is released.
1992 🪟 Linux gains GUI: X is ported to LIinux.
1996 🐣 KDE project is created.
1998 🐣 KHTML is created by the KDE project.
2001 🍴 WebKit forks KHTML, public in 2003.
2006 🌟 Chrome (Browzr) based on WebKit, public in 2008.
2013 🍴 Blink forks WebKit.
2016 😵 KHTML development stops.

Early 2013, architecture tensions occurred in the threading model, where Apple engineers wanted to integrate iOS specific asserts because refactoring the iOS code to correctly use WebKit would take many years.

In parallel, on March 5th, Apple was granted the WebKit trademark that they had filed a registration for in 2010. I don’t personally know if people outside Apple were aware of this.

Less than a month later on April 3rd, the tension around the design and ownership culminated in the fork of WebKit.

Many non-Apple WebKit contributors stated that they would start using Blink too, exposing a preexisting discomfort.

How to reduce Drama

Ok so all of this sounds very dramatic.

To be clear, there’s no good or evil. This is just people with different incentives. Talking to each other in person often leads to better conflict resolution than over a mailing list.

Since most core developers from both Apple and Google were in the Bay Area, they met in person to help with increasing rapport and trust. I went to one of these meetings. It was fun!

Ultimately what is important is to find solutions that meet the needs of each core contributing group.

Starting in 2008, the Chromium testing infrastructure grew extremely fast. I wanted to use these amazing ideas like running tests before submitting patches. 🤯

So in October, one month after the public release, I created the Try Server, taking inspiration from Mozilla’s infrastructure to make it possible for developers to test their patches before submission. Thereafter, I worked on the developer infrastructure on the Chromium project and I created the Commit Queue in 2010, which automatically merges patches after passing the automated presubmit tests.

One of the things I made sure as part of my job was that our infrastructure would never fetch from WebKit’s subversion server.

Beware of the all mighty Lucy! Death Star copyright Disney (or some other Empire, dunno)

That was because our infrastructure grew so big, with hundreds of workers, that if we fetched by accident from the WebKit’s root source control, instead of our local mirrors, we instantaneously killed their server and nobody would be able to checkout or commit to WebKit.

We called it the Chromium CI’s Ray of Death.

Big disclaimer here: I was not part of the group that decided to do the fork.

I was looped in because forking WebKit meant reconstructing the WebKit test infrastructure on our own. It was a significant endeavor. Someone on the team decided to buy a huge fork as a symbol of our focus to succeed.

Source control matters

1986 CVS was created, used by many OSS projects (vs tarballs)
2000 Subversion was created
2005 Git was created
2015 Chromium switched to git (git log –grep git-svn-id)
2022 WebKit moves to GitHub

On the technical side, the source control and contribution gatekeeping, what we call code reviews, has a huge impact. Subversion was a state of the art source control in 2006.

At the time, Git couldn’t be used by the Chromium team because it didn’t work on Windows yet so we used subversion.

On the other hand, ChromeOS started using git right away because they didn’t have to care about Windows. Lucky them.

WebKit was using Subversion too.

When you fork, the source control system matters. Forking was destructive on Subversion, you can’t keep a nice history like we did on git. Nowadays most open source projects use git (Hello mercurial!) but that was not the case up to recently as shown with WebKit. Switching to a different version control system is extremely costly, release management has to be redone, and the Chromium project did it twice, once from Perforce to Subversion, and from Subversion to git.

What was the impact of the blink fork? Did it help? Did it hurt?

Impact

Disclaimer: these are my personal observations.

I believe that Apple didn’t have the business incentive to significantly increase the capability of web pages on phones because of the App Store. One can argue that Google didn’t have either on Android! And certainly there were internal tensions about this.

But ChromeOS was the key element here, its leadership really wanted to make the devices more appealing to users.

Enhancements that the Blink fork enabled or helped accelerate:

🤮 Removal of vendor prefixes
🐡 Web Capabilities: Project Fugu
🤯 Presubmit blocked on test failure
💾 WASM (Née asm.js)
📂 PWA
⛵ Intent to {Implement,Ship}

What really changed the development as you know it is the removal of vendor prefixes. I believe this was the right thing to do, similar to how auto-updating the web browser was the right thing to do. The fact that Google Chrome auto-updated itself in 2008 was extremely controversial.

The other thing that really helped make Chromium popular is that you can check out any version, build it, and get a working web browser.

And I’m really proud to have made this possible.

Chromium is dependable, it’s great to use as a dependency.

This is very important to downstream businesses leveraging the project. Chromium became an upstream. Node.js and Electron are great examples, and also all the derivative web browsers like Arc.

Individuals per affiliation per quarter (Source).

It also significantly increased the number of non-Google contributors overall.

When you have an upstream/downstream relationship, there’s a few things that can help to make the relationship work, both on the business level, the technical level but also on the human level. How can we all meet in the middle?

I’ll share some hard earned experiences so you can set yourself for success if you ever see yourself in this situation. The most critical part is the upstream project because of the inherent power dynamics. Let’s dive into it.

Being a good upstream

Self Protection

First, you need to make sure the team doesn’t burn itself. The absolute self-protection is being what I call a code dump. Just don’t accept contributions. There’s no problem! An example is open weights LLMs. It’s a pure publication, there’s no feedback mechanism. An even better example is SQLite: sqlite.org/copyright.html.

SQLite maintainers save a ton of time because there’s a ton of things they don’t need to formalize. Ask for features, preferably with a P.O. number and it’s going to be fine. 💰 This can work but it’s more the exception than the rule because the project intentionally cannot get benefits from other contributors.

One benefit of such setup is that SQLite is released under a public domain license. It wouldn’t be possible otherwise.

License

The license is critical. If you use a viral license like AGPL, be ready to enforce it. A good example in the 3D printer community is Slic3r. There’s a lot of parallels to do with KHTML.

Both KHTML and Slic3r had many derivative projects. KHTML license is LGPLv2, where Slic3r is AGPLv3.

The AGPL license has extreme requirements about open sourcing derivatives. I can hear from here Bill Gates screaming how much he hates it. Some companies got caught not open sourcing the code when they released their software, Bambulab being a prime example and interestingly enough, they now have a healthy OSS project because of this required open sourcing.

Slic3r had a clear vision about doing one thing well. Processing a 3D model to make it printable by a FDM printer.

Vision Statement

You want a clear vision statement associated with your project, in particular to make it clear what is out of scope. This is extremely important so folks don’t spend time to create an abstraction layer or just contributions that will be flat out refused.

Sometimes the weight of the new contributor can tilt the balance. Initially WebKit assumed JavascriptCore, and Apple agreed to create an abstraction layer, because getting Google onboard was worth the cost.

This kind of event is more the exception than the rule.

Owning a small project means everything is in your head. It’s crystal clear. It gets challenging when external users want to contribute and misunderstand critical subtleties in how the project is designed, or tested.

Handling Regressions

Some projects are notorious for not having test or CI pipeline. The problem with this approach is that any external contribution becomes a huge liability. If a regression is detected down the road, the maintainers will likely have to hold the bag and repair the damage done. What regression tests and infrastructure do is to shorten the feedback loop, and even prevent regressions from happening in the first place. This requires high quality tests, and it’s work! It also requires looking at bug reports and handling them. It’s also a lot of work! Regressions are one kind of risk for the survival of the project.

Managing Risks

There’s many other risks, like malicious behavior from an external party. A frightening recent example is the xz project where a malicious person took ownership of the xz project and injected broken behavior intentionally to break the ssh server. It took the attacker more than 3 years of social engineering to gain control. It’s pretty wild that as of today, we still don’t know who is behind this attack.

As an upstream, you have to be prepared on how to handle this. Create contact points for security. If someone finds a security issue, can they contact you in a secure manner? GitHub has tooling to set up a security policy. Use it. Again, more work. All this work is starting to become costly.

Managing costs

Open source projects range from an old tarball available on the internet to a highly structured project with a ton of infrastructure.

Some projects have costs that are adjacent to the source code. Any repository-like project will suffer from that. For each of npm, Docker, pypi, rubygems, rust crates, huggingface, go proxy, they are open source software, and an infrastructure with huge storage and network costs.

Upstream projects have to self-protect as early as possible, otherwise they’ll have to make sad choices down the road because users cannot free-ride their infrastructure anymore.

The advent of free CI solutions for open source projects, including Travis in the old days to GitHub Actions Hosted Runners in the recent years, and free CDN like CloudFlare really helped the lower end. But on the high end, running on free-will will not work and one has to be paid to maintain the CI.

BTW, that’s literally my professional job and you can hire me to help you with that!

To take an example I know well, the Chromium infrastructure has tens of thousands of machines, tens of thousands of machines, running tests on a continuous basis. Each test suite takes tens of hours to run and is scaled horizontally to get results within tens of minutes. And it looks like the only PowerPC machine is not getting any love.

Having sponsors that pays the bill to make the project sustainable is critical for large projects.

The project will not survive if there’s no money injected in it. Large projects take time to build and test and the free quota offered by companies like GitHub will not be sufficient. Would it be a salary to do the boring work? One has to triage incoming bugs and feature requests. Cost management is especially important when the project has a lot of code churn.

Screenshot of chromium-swarm.appspot.com/botlist

Handling Churn

Churn happens in multiple forms.

Some folks will move code around or reformat. If you didn’t clean your code first, others may do it for you, and may not do it the way you wanted it to be. Porting the project to a new operating system will create a ton of churn and the code often has to be designed, future proofed, to handle that. We experienced it when porting Chromium to Fuchsia.

Contributions will increase your CI use, which in some cases may cause issues like exhausting your free CI quota.

Often folks will send a simple PR just to be listed as a contributor to a project. 🙋

As such, some projects ban contributions that are only typo fixes or similar.

An example is TensorFlow, where non critical contributions are refused. The rationale is that every single PR is tested internally inside Google’s infrastructure, which is extremely expensive given Google’s mono-repo setup. While it’s true that it helps protect the project and its cost, I believe that ultimately refusing good contributions, and I insist on the word good, ruffles contributors the wrong way.

TensorFlow is a highly structured project with centralized decisions within Google. It’s not all projects that are centralized the same.

Governance

Very large projects tend to gravitate towards one of two structures: either one company or foundation stewards the architectural decision process, or a more diffuse group of people without an official governance. The latter is often a single owner project that outgrew the original maintainer.

Governance is often under defined yet is critical in the long term.

Well organized projects have a way to signal and document important changes. It comes in many names: Intent to implement or to ship, RFC, RFD or PEPs to name a few.

Blink, Gecko and WebKit: intenttoship.dev
0xide RFC: rfd.shared.oxide.computer
Fuchsia RFCs: docs/contribute/governance/rfcs
Python PEPs: github.com/python/peps/tree/main/peps

I recommend you to take a look at them. It may not be something you’ve sought after and there’s treasures of well written design docs that you can learn a lot from. Some are more about organizational issues, like PEPs, where others are squarely focused on the technical side.

Most use markdown and git to store the data, some still use issue trackers like GitHub issues. Both TypeScript and Go seem to do that. Web browsers tend to use mailing lists instead.

Defining a strong governance structure is .. work. As I said earlier, the best way to avoid that is to do a code dump like SQLite is doing.

Downstream

I discussed a few ways to create a healthy upstream project. Now what about being a good downstream user and fully taking part of the project?

Do your homework

First things first, it’s critical to do your homework, and I’ll give multiple examples of things that I consider homeworks. Learn about the project first. Understand the contributor’s expectations. Contributors pretty much always come with good intentions. It’s been extremely rare to see evil contributions, the xz project take over being an extreme case.

One homework is to understand how to comply with the license. If you use a copyleft license from the GPL license family, you need to publish modifications, e.g. Microsoft.

Contributing as a learning experience

One of the biggest challenges as a maintainer is all the well intentioned folks that are learning but come from too far behind and are just not there yet on the technical level to contribute at the expected quality standards of the project. It can be very time consuming for the maintainers to hand hold every single change that inexperienced people want to contribute.

It’s fine to ask folks to clean their work, but it’s much easier when it’s software that asks for it. A good example is enforcing strict code formatting and some linters, so that the worst contributors just drop out. Make sure tests are blocking too. You may even enforce minimum code coverage. Ask the upstream if they would appreciate such tools. They may decline, and it’s important to figure out why.

Understand Upstream’s Incentives

Because the upstream’s incentives are not always obvious. Some projects are extremely business aligned, where others are much more engineering driven. Compare a VC backed company versus a small javascript library authored by a single person. Both are fine, figure out what you are getting into.

Dampening One-Off Requests

Understand the upstream’s project vision to really appreciate what features should be contributed or not, and when an abstraction layer is worth injecting. A great example has been when Google made WebKit multi-threaded for its out-of-process renderer.

Silent assumptions in the design are often going to be the biggest challenges to resolve because they are taken for granted. It’s obvious for the maintainer. It’s all but obvious for the contributors. As a contributor, the work falls on your shoulders to reverse engineer the silent assumptions. Maybe even contribute a bit of documentation to spell them out?

Adding Value

As a downstream user, adding value is often in the parts that the maintainer dislikes doing. While not always the case, it will often be in the documentation, testing and correctness.

When you get more intimate with the project, program organization like triaging issues, reviewing contributions from others, or even maintaining the project’s web site and discussion forums are generally very appreciated.

Be there for the long run. A success story is that Bloomberg wanted a CSS grid for their terminal. They contracted Igalia to define the standards, do the design, and write the implementation on their behalf. This work took about several years and it now benefits everyone!

The value that Bloomberg provided is 💰 to pay for the work. Huge props to them too!

In the last 6 years, Igalia employees have been contributing thousands of patches to Chromium, consistently more than Microsoft even if they are a much smaller team. Thanks Igalia for making the web better.

When it’s time

Even if you add value as a downstream contributor, the drift in vision may lead you to decide that you need to fork.

Maybe you are annoyed that upstream dumps 100 changes right after their annual developer conference?

Maybe upstream is a royal jerk?

What to do when you decide to fork instead of contributing upstream? Define your tenets. Your values. Also, it’s really important to clearly define what is the source of truth.

This step is critical.

There’s two kinds of forks. You can do a soft fork where you continue to track upstream. A good example is all the Chromium derived browsers. They have to track upstream because the web keeps on evolving and we do not tolerate zero day security issues like in the 90s.

You can do a hard fork, like WebKit did with KHTML or Blink with WebKit, where you try to keep things in sync a bit but the chasm is so strong that the projects are determined to drift apart.

In the case of soft forks, the first thing to plan is to continue to contribute to keep your fork as simple as possible. Be self-conscious about what upstream will accept or refuse. You need to keep a process to sync up with upstream and manage the local changes that are not going to be upstreamed.

Source of Truth:

Soft fork
Hard fork

Soft fork: Tracking upstream:

SCM: Merge branches: merge upstream using git merge workflow.
SCM: Rebase at each release: rebase your changes on top of upstream.
Manual: Set of patches (*.patch) applied post checkout.

I experienced all 3 ways of tracking upstream and what you should end up doing really depends. For low intensity upstream, using merge or rebase is fine but it doesn’t scale to high intensity upstreams. In this case, a set of patches is often used.

github.com/brave/brave-core/tree/master/patches is a great example of manual patches stacked on top of an upstream. The original source is checked out intact, then patches are applied on top. The drawback is that patches have to be updated continuously. This means every new major Chromium version requires updating the patches. They reduce the pain by using mostly one liners, where they use C++ macros or GN variables to reduce the scope of changes in the patches.

Happy coding!

This is a topic I love discussing. Nerd snipe me on your favorite social network.

Need professional services? Reach me at build@maruel.ca.

Thanks to Hai, Rick and Seb for earlier draft feedback, and Brett for the pictures.

Thanks all!