logo

drewdevault.com

[mirror] blog and personal website of Drew DeVault git clone https://hacktivis.me/git/mirror/drewdevault.com.git

Google-has-been-DDoSing-sourcehut.md (5289B)


  1. ---
  2. title: Google has been DDoSing SourceHut for over a year
  3. date: 2022-05-25
  4. ---
  5. Just now, I took a look at the HTTP logs on git.sr.ht. Of the past 100,000 HTTP
  6. requests received by git.sr.ht (representing about 2½ hours of logs), 4,774 have
  7. been requested by GoModuleProxy — 5% of all traffic. And their requests
  8. are not cheap: every one is a complete git clone. They come in bursts, so every
  9. few minutes we get a big spike from Go, along with a constant murmur of Go
  10. traffic.
  11. This has been ongoing since around the release of Go 1.16, which came with some
  12. changes to how Go uses modules. Since this release, following a gradual ramp-up
  13. in traffic as the release was rolled out to users, git.sr.ht has had a constant
  14. floor of I/O and network load for which the majority can be attributed to Go.
  15. I started to suspect that something strange was going on when our I/O alarms
  16. started going off in February 2021 (we eventually had to tune these alarms up
  17. above the floor of I/O noise generated by Go), correlated with lots of activity
  18. from a Go user agent. I was able to narrow it down with some effort, but to the
  19. credit of the Go team they did [change their User-Agent to make more apparent
  20. what was going on][0]. Ultimately, this proved to be the end of the Go team's
  21. helpfulness in this matter.
  22. [0]: https://github.com/golang/go/issues/44468
  23. I did narrow it down: it turns out that the Go Module Mirror runs some crawlers
  24. that periodically clone Git repositories with Go modules in them to check for
  25. updates. Once we had narrowed this down, I filed [a second ticket][1] to address
  26. the problem.
  27. [1]: https://github.com/golang/go/issues/44577
  28. I came to understand that the design of this feature is questionable. For a
  29. start, I never really appreciated the fact that Go secretly calls home to Google
  30. to fetch modules through a proxy (you can set [GOPROXY=direct][2] to fix this).
  31. Even taking the utility at face value, however, the implementation leaves much
  32. to be desired. The service is distributed across many nodes which all crawl
  33. modules independently of one another, resulting in very redundant git traffic.
  34. [2]: https://drewdevault.com/2021/08/06/goproxy-breaks-go.html
  35. ```
  36. 140 8a42ab2a4b4563222b9d12a1711696af7e06e4c1092a78e6d9f59be7cb1af275
  37. 57 9cc95b73f370133177820982b8b4e635fd208569a60ec07bd4bd798d4252eae7
  38. 44 9e730484bdf97915494b441fdd00648f4198be61976b0569338a4e6261cddd0a
  39. 44 80228634b72777eeeb3bc478c98a26044ec96375c872c47640569b4c8920c62c
  40. 44 5556d6b76c00cfc43882fceac52537b2fdaa7dff314edda7b4434a59e6843422
  41. 40 59a244b3afd28ee18d4ca7c4dd0a8bba4d22d9b2ae7712e02b1ba63785cc16b1
  42. 40 51f50605aee58c0b7568b3b7b3f936917712787f7ea899cc6fda8b36177a40c7
  43. 40 4f454b1baebe27f858e613f3a91dfafcdf73f68e7c9eba0919e51fe7eac5f31b
  44. ```
  45. This is a sample from [a larger set][3] which shows the hashes of git
  46. repositories on the right (names were hashed for privacy reasons), and the
  47. number of times they were cloned over the course of an hour. The main culprit is
  48. the fact that the nodes all crawl independently and don't communicate with each
  49. other, but the per-node stats are not great either: each IP address still clones
  50. the same repositories 8-10 times per hour. [Another user][4] hosting their own
  51. git repos noted a single module being downloaded over 500 times in a single day,
  52. generating 4 GiB of traffic.
  53. [3]: https://paste.sr.ht/~sircmpwn/b46ad0b13e864923df80cb8e8285bf1661e6f872
  54. [4]: https://github.com/golang/go/issues/44577#issuecomment-851079949
  55. The Go team holds that this service is not a crawler, and thus they do not obey
  56. robots.txt — if they did, I could use it to configure a more
  57. reasonable "Crawl-Delay" to control the pace of their crawling efforts. I also
  58. suggested keeping the repositories stored on-site and only doing a git fetch,
  59. rather than a fresh git clone every time, or using shallow clones. They could
  60. also just fetch fresh data when users request it, instead of pro-actively
  61. crawling the cache all of the time. All of these suggestions fell on deaf ears,
  62. the Go team has not prioritized it, and a year later I am still being DDoSed by
  63. Google as a matter of course.
  64. I was banned from the Go issue tracker for mysterious reasons,[^1] so I cannot
  65. continue to nag them for a fix. I can't blackhole their IP addresses, because
  66. that would make all Go modules hosted on git.sr.ht stop working for default Go
  67. configurations (i.e. without GOPROXY=direct). I tried to advocate for Linux
  68. distros to patch out GOPROXY by default, citing privacy reasons, but I was
  69. unsuccessful. I have no further recourse but to tolerate having our little-fish
  70. service DoS'd by a 1.38 trillion dollar company. But I will say that if I was in
  71. their position, and my service was mistakenly sending an excessive amount of
  72. traffic to someone else, I would make it my first priority to fix it. But I
  73. suppose no one will get promoted for prioritizing that at Google.
  74. [^1]: In violation of Go's own Code of Conduct, by the way, which requires that participants are notified moderator actions against them and given the opportunity to appeal. I happen to be well versed in Go's CoC given that I was banned once before without notice — a ban which was later overturned on the grounds that the moderator was wrong in the first place. Great community, guys.