logo

drewdevault.com

[mirror] blog and personal website of Drew DeVault git clone https://hacktivis.me/git/mirror/drewdevault.com.git

What-is-a-fork.md (7861B)


  1. ---
  2. date: 2019-05-24
  3. layout: post
  4. title: "What is a fork, really, and how GitHub changed its meaning"
  5. tags: ["philosophy", "sourcehut", "free software"]
  6. ---
  7. The fork button on GitHub - with the little number next to it for depositing
  8. dopamine into your brain - is a bit misleading. GitHub co-opted the meaning of
  9. "fork" to trick you into participating in their platform more. They did this in
  10. a well-intentioned way, for the sake of their pull requests feature, but
  11. ultimately this design is self-serving and causes some friction when
  12. contributors venture out of their GitHub sandbox and into the rest of the
  13. software development ecosystem. Let's clarify what "fork" really means, and what
  14. we do without GitHub's concept of one - for it is in this difference that we
  15. truly discover how git is a *distributed* version control system.
  16. **Disclaimer**: I am the founder of [SourceHut](https://sourcehut.org), a
  17. product which competes with GitHub and embraces the "bazaar[^1]" model described
  18. in this article.
  19. [^1]: Not the bazaar version control system, but bazaar the concept. This is explained later in the article.
  20. On GitHub, a fork refers to a copy of a repository used by a contributor[^2] to
  21. stage changes they'd like to propose upstream. Prior to GitHub (and in many
  22. places still today), we'd call such a repository a "personal branch". A personal
  23. branch doesn't need to be published to be useful - you can just `git clone` it
  24. locally and make your changes there without pushing them to a public, hosted
  25. repository. Using [email](https://git-send-email.io), you can send changes from
  26. your local, unpublished repository for consideration upstream. Outside of
  27. GitHub and its imitators, most contributors to a project don't have a published
  28. version of their repository online at all, skipping that step and saving some
  29. time.
  30. [^2]: And by bots to increase their reputation, and by confused users who don't know what the button means.
  31. In some cases, however, it's useful to publish your personal branch online. This
  32. is often done when a team of people is working on a long-lived branch to later
  33. propose upstream - for example, I've been doing this while working on the RISC-V
  34. port of musl libc. It gives us a space to collaborate and work while preparing
  35. changes which will eventually be proposed upstream, as well as a place for
  36. interested testers to obtain our experimental work to try themselves. This is
  37. also done by individuals, such as Greg Kroah-Hartman's Linux branches, which are
  38. useful for testing upcoming changes to the Linux kernel.
  39. Greg is not alone in publishing a repo like this. In fact, there are [hundreds of
  40. kernel trees like this][kernel-git]. These act as staging areas for long-term
  41. workstreams, or for the maintainers of many subsystems of the kernel. Changes
  42. in these repositories gradually flow upwards towards the "main" tree,
  43. [torvalds/linux][torvalds/linux]. The precise meaning of "linux" is rather loose
  44. in this context. An argument could be made that torvalds/linux is Linux, but
  45. that definition wouldn't capture the LTS branches. Many distros also apply their
  46. own patches on top of Torvalds, perhaps sourcing them from the maintainers of
  47. drivers they need a bugfix for, or they maintain their own independent trees
  48. which periodically pull in lump sums of changes from other trees - meaning that
  49. the simple definition might not include the version of Linux which is installed
  50. on your computer, either. This ambiguity is a feature - each of these trees is a
  51. valid definition of Linux in its own right.
  52. [kernel-git]: https://git.kernel.org/pub/scm/linux/kernel/git/
  53. [torvalds/linux]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
  54. This is the sense in which git is "distributed". The idea of a canonical
  55. upstream is not written in stone in the way that GitHub suggests it might be.
  56. After all, open-source software is a collaborative endeavour. What makes Jim's
  57. branch more important that John's branch? John's branch is definitely more
  58. important if it has the bugfixes you need. In fact, your branch, based on Jim's,
  59. with some patches cherry-picked from John, and a couple of fixes of your own
  60. mixed in, may in fact be the best version of the software for you.
  61. This is how the git community gets along without the GitHub model of "forks".
  62. This design has allowed the largest and most important projects in the world to
  63. flourish, and git was explicitly designed around this model. We refer to this as
  64. the "bazaar" model, the metaphor hopefully being fairly obvious at this point.
  65. There is another model, which GitHub embodies instead: the cathedral. In this
  66. model, the project has a central home and centralized governance, run by a small
  67. number of people. The cathedral doesn't necessarily depend on the GitHub idea of
  68. "forks" and pull requests - that is, you can construct a cathedral with
  69. email-driven development or some other model - but on GitHub the bazaar option
  70. is basically absent.
  71. In the introduction I said that GitHub attempts to replace an existing meaning
  72. for "fork". So what does forking actually mean, then? Consider a project with
  73. the cathedral model. What happens when there's a schism in the church? The
  74. answer is that some of the contributors can take the code, put up a new branch
  75. somewhere, and stake a flag in the ground. They rename it and commit to
  76. maintaining it entirely independently of the original project, and encourage
  77. contributors, new and old alike, to abandon the old dogma in favor of theirs.
  78. At this point, the history[^3] begins to diverge. The new contingent pulls in
  79. all of the patches that were denied upstream and start that big refactoring to
  80. mold it in their vision. The project has been **forked**. A well known example
  81. is when ffmpeg was forked to create libav.
  82. [^3]: Git history in particular, but also the other kind.
  83. This is usually a traumatic event for the project, and can have repercussions
  84. that last for years. The precise considerations that should go into forking a
  85. project, these repercussions and how to address them, and other musings are
  86. better suited for a separate article. But this is what "fork" meant before
  87. GitHub, and this meaning is still used today - albeit more ambiguously.
  88. If "fork" already had this meaning, why did GitHub adopt their model? The
  89. answer, as it often will be, is centralization of power. GitHub is a
  90. proprietary, commercial service, and their ultimate goal is to turn a profit.
  91. The design of GitHub's fork and pull request model creates a cathedral that
  92. keeps people on their platform in a way that a bazaar would not. A distributed
  93. version control system like git, built on a distributed communications protocol
  94. like email, is hard to disrupt with a centralized service. So GitHub designed
  95. their own model.
  96. As a parting note, I would like to clarify that this isn't a condemnation of
  97. GitHub. I still use their service for a few projects, and appreciate the
  98. important role GitHub has played in the popularization of open source. However,
  99. I think it's important to examine the services we depend on, to strive to
  100. understand their motivations and design. I also hope the reader will view the
  101. software ecosystem through a more interesting lens for having read this article.
  102. Thank you for reading!
  103. ---
  104. **P.S.** Did you know that GitHub also captured the meaning of "pull request"
  105. from git's own [request-pull](https://www.git-scm.com/docs/git-request-pull)
  106. tool? git request-pull prepares an email which will ask the recipient to fetch
  107. changes from a public repository and integrate them into their own branch. This
  108. is used when a patch is insufficient - for example, when Linux subsystem
  109. maintainers want to ship a large group of changes to Torvalds for the next
  110. kernel release. Again, the original version is distributed and bazaar-like,
  111. whereas GitHub's is centralized and makes you stay on their platform.