Software Archaeology

An important aspect of the domain

2023-03-25

I was recently reminded of a post I sent to the Google Corp internal Google+ (RIP) a decade(!) ago.

The topic of the post was software archaeology. The prompt was how Blink (Chromium’s Web engine) internally uses reference counting that traces back to early KHTML.

I’m reposting this decade-old observation that is even more relevant today:

Every Computer Science curriculum should include a course named “Source Code Archaeology”

Subjects would include:

How to search as if you were the original developer.
Date-based searching: connecting the dots between mailing list discussions and commits.
Finding code plagiarism across repositories.
Getting through a code refactor to find the original author.
Linearizing commits across multiple SCM systems to follow through a project’s history when it switched SCMs.
Finding the needle in the haystack: how to ask the SCM for relevant information and filter out the rest.
Design, not code: how to find the author’s inspiration for the design.

Photo by Hulki Okan