Google-has-been-DDoSing-sourcehut.md (5289B)
- ---
- title: Google has been DDoSing SourceHut for over a year
- date: 2022-05-25
- ---
- Just now, I took a look at the HTTP logs on git.sr.ht. Of the past 100,000 HTTP
- requests received by git.sr.ht (representing about 2½ hours of logs), 4,774 have
- been requested by GoModuleProxy — 5% of all traffic. And their requests
- are not cheap: every one is a complete git clone. They come in bursts, so every
- few minutes we get a big spike from Go, along with a constant murmur of Go
- traffic.
- This has been ongoing since around the release of Go 1.16, which came with some
- changes to how Go uses modules. Since this release, following a gradual ramp-up
- in traffic as the release was rolled out to users, git.sr.ht has had a constant
- floor of I/O and network load for which the majority can be attributed to Go.
- I started to suspect that something strange was going on when our I/O alarms
- started going off in February 2021 (we eventually had to tune these alarms up
- above the floor of I/O noise generated by Go), correlated with lots of activity
- from a Go user agent. I was able to narrow it down with some effort, but to the
- credit of the Go team they did [change their User-Agent to make more apparent
- what was going on][0]. Ultimately, this proved to be the end of the Go team's
- helpfulness in this matter.
- [0]: https://github.com/golang/go/issues/44468
- I did narrow it down: it turns out that the Go Module Mirror runs some crawlers
- that periodically clone Git repositories with Go modules in them to check for
- updates. Once we had narrowed this down, I filed [a second ticket][1] to address
- the problem.
- [1]: https://github.com/golang/go/issues/44577
- I came to understand that the design of this feature is questionable. For a
- start, I never really appreciated the fact that Go secretly calls home to Google
- to fetch modules through a proxy (you can set [GOPROXY=direct][2] to fix this).
- Even taking the utility at face value, however, the implementation leaves much
- to be desired. The service is distributed across many nodes which all crawl
- modules independently of one another, resulting in very redundant git traffic.
- [2]: https://drewdevault.com/2021/08/06/goproxy-breaks-go.html
- ```
- 140 8a42ab2a4b4563222b9d12a1711696af7e06e4c1092a78e6d9f59be7cb1af275
- 57 9cc95b73f370133177820982b8b4e635fd208569a60ec07bd4bd798d4252eae7
- 44 9e730484bdf97915494b441fdd00648f4198be61976b0569338a4e6261cddd0a
- 44 80228634b72777eeeb3bc478c98a26044ec96375c872c47640569b4c8920c62c
- 44 5556d6b76c00cfc43882fceac52537b2fdaa7dff314edda7b4434a59e6843422
- 40 59a244b3afd28ee18d4ca7c4dd0a8bba4d22d9b2ae7712e02b1ba63785cc16b1
- 40 51f50605aee58c0b7568b3b7b3f936917712787f7ea899cc6fda8b36177a40c7
- 40 4f454b1baebe27f858e613f3a91dfafcdf73f68e7c9eba0919e51fe7eac5f31b
- ```
- This is a sample from [a larger set][3] which shows the hashes of git
- repositories on the right (names were hashed for privacy reasons), and the
- number of times they were cloned over the course of an hour. The main culprit is
- the fact that the nodes all crawl independently and don't communicate with each
- other, but the per-node stats are not great either: each IP address still clones
- the same repositories 8-10 times per hour. [Another user][4] hosting their own
- git repos noted a single module being downloaded over 500 times in a single day,
- generating 4 GiB of traffic.
- [3]: https://paste.sr.ht/~sircmpwn/b46ad0b13e864923df80cb8e8285bf1661e6f872
- [4]: https://github.com/golang/go/issues/44577#issuecomment-851079949
- The Go team holds that this service is not a crawler, and thus they do not obey
- robots.txt — if they did, I could use it to configure a more
- reasonable "Crawl-Delay" to control the pace of their crawling efforts. I also
- suggested keeping the repositories stored on-site and only doing a git fetch,
- rather than a fresh git clone every time, or using shallow clones. They could
- also just fetch fresh data when users request it, instead of pro-actively
- crawling the cache all of the time. All of these suggestions fell on deaf ears,
- the Go team has not prioritized it, and a year later I am still being DDoSed by
- Google as a matter of course.
- I was banned from the Go issue tracker for mysterious reasons,[^1] so I cannot
- continue to nag them for a fix. I can't blackhole their IP addresses, because
- that would make all Go modules hosted on git.sr.ht stop working for default Go
- configurations (i.e. without GOPROXY=direct). I tried to advocate for Linux
- distros to patch out GOPROXY by default, citing privacy reasons, but I was
- unsuccessful. I have no further recourse but to tolerate having our little-fish
- service DoS'd by a 1.38 trillion dollar company. But I will say that if I was in
- their position, and my service was mistakenly sending an excessive amount of
- traffic to someone else, I would make it my first priority to fix it. But I
- suppose no one will get promoted for prioritizing that at Google.
- [^1]: In violation of Go's own Code of Conduct, by the way, which requires that participants are notified moderator actions against them and given the opportunity to appeal. I happen to be well versed in Go's CoC given that I was banned once before without notice — a ban which was later overturned on the grounds that the moderator was wrong in the first place. Great community, guys.