I was recently reminded of a post I sent to the Google Corp internal Google+
(RIP) a decade(!) ago.
The topic of the post was software archaeology. The prompt was how Blink
(Chromium’s Web engine) internally uses reference counting that traces back to
early KHTML.
Picture of an archaeology site.
I’m reposting this decade-old observation that is even more relevant today:
Every Computer Science curriculum should include a course named “Source Code
Archaeology”
Subjects would include:
How to search as if you were the original developer.
Date-based searching: connecting the dots between mailing list discussions and
commits.
Finding code plagiarism across repositories.
Getting through a code refactor to find the original author.
Linearizing commits across multiple SCM systems to follow through a
project’s history when it switched SCMs.
Finding the needle in the haystack: how to ask the SCM for relevant
information and filter out the rest.
Design, not code: how to find the author’s inspiration for the design.