Google-has-been-DDoSing-sourcehut.md - drewdevault.com - [mirror] blog and personal website of Drew DeVault

Google-has-been-DDoSing-sourcehut.md (5289B)

---
title: Google has been DDoSing SourceHut for over a year
date: 2022-05-25
---
Just now, I took a look at the HTTP logs on git.sr.ht. Of the past 100,000 HTTP
requests received by git.sr.ht (representing about 2½ hours of logs), 4,774 have
been requested by GoModuleProxy &mdash; 5% of all traffic. And their requests
are not cheap: every one is a complete git clone. They come in bursts, so every
few minutes we get a big spike from Go, along with a constant murmur of Go
traffic.
This has been ongoing since around the release of Go 1.16, which came with some
changes to how Go uses modules. Since this release, following a gradual ramp-up
in traffic as the release was rolled out to users, git.sr.ht has had a constant
floor of I/O and network load for which the majority can be attributed to Go.
I started to suspect that something strange was going on when our I/O alarms
started going off in February 2021 (we eventually had to tune these alarms up
above the floor of I/O noise generated by Go), correlated with lots of activity
from a Go user agent. I was able to narrow it down with some effort, but to the
credit of the Go team they did [change their User-Agent to make more apparent
what was going on][0]. Ultimately, this proved to be the end of the Go team's
helpfulness in this matter.
[0]: https://github.com/golang/go/issues/44468
I did narrow it down: it turns out that the Go Module Mirror runs some crawlers
that periodically clone Git repositories with Go modules in them to check for
updates. Once we had narrowed this down, I filed [a second ticket][1] to address
the problem.
[1]: https://github.com/golang/go/issues/44577
I came to understand that the design of this feature is questionable. For a
start, I never really appreciated the fact that Go secretly calls home to Google
to fetch modules through a proxy (you can set [GOPROXY=direct][2] to fix this).
Even taking the utility at face value, however, the implementation leaves much
to be desired. The service is distributed across many nodes which all crawl
modules independently of one another, resulting in very redundant git traffic.
[2]: https://drewdevault.com/2021/08/06/goproxy-breaks-go.html
```
140 8a42ab2a4b4563222b9d12a1711696af7e06e4c1092a78e6d9f59be7cb1af275
57 9cc95b73f370133177820982b8b4e635fd208569a60ec07bd4bd798d4252eae7
44 9e730484bdf97915494b441fdd00648f4198be61976b0569338a4e6261cddd0a
44 80228634b72777eeeb3bc478c98a26044ec96375c872c47640569b4c8920c62c
44 5556d6b76c00cfc43882fceac52537b2fdaa7dff314edda7b4434a59e6843422
40 59a244b3afd28ee18d4ca7c4dd0a8bba4d22d9b2ae7712e02b1ba63785cc16b1
40 51f50605aee58c0b7568b3b7b3f936917712787f7ea899cc6fda8b36177a40c7
40 4f454b1baebe27f858e613f3a91dfafcdf73f68e7c9eba0919e51fe7eac5f31b
```
This is a sample from [a larger set][3] which shows the hashes of git
repositories on the right (names were hashed for privacy reasons), and the
number of times they were cloned over the course of an hour. The main culprit is
the fact that the nodes all crawl independently and don't communicate with each
other, but the per-node stats are not great either: each IP address still clones
the same repositories 8-10 times per hour. [Another user][4] hosting their own
git repos noted a single module being downloaded over 500 times in a single day,
generating 4 GiB of traffic.
[3]: https://paste.sr.ht/~sircmpwn/b46ad0b13e864923df80cb8e8285bf1661e6f872
[4]: https://github.com/golang/go/issues/44577#issuecomment-851079949
The Go team holds that this service is not a crawler, and thus they do not obey
robots.txt&nbsp;&mdash; if they did, I could use it to configure a more
reasonable "Crawl-Delay" to control the pace of their crawling efforts. I also
suggested keeping the repositories stored on-site and only doing a git fetch,
rather than a fresh git clone every time, or using shallow clones. They could
also just fetch fresh data when users request it, instead of pro-actively
crawling the cache all of the time. All of these suggestions fell on deaf ears,
the Go team has not prioritized it, and a year later I am still being DDoSed by
Google as a matter of course.
I was banned from the Go issue tracker for mysterious reasons,[^1] so I cannot
continue to nag them for a fix. I can't blackhole their IP addresses, because
that would make all Go modules hosted on git.sr.ht stop working for default Go
configurations (i.e. without GOPROXY=direct). I tried to advocate for Linux
distros to patch out GOPROXY by default, citing privacy reasons, but I was
unsuccessful. I have no further recourse but to tolerate having our little-fish
service DoS'd by a 1.38 trillion dollar company. But I will say that if I was in
their position, and my service was mistakenly sending an excessive amount of
traffic to someone else, I would make it my first priority to fix it. But I
suppose no one will get promoted for prioritizing that at Google.
[^1]: In violation of Go's own Code of Conduct, by the way, which requires that participants are notified moderator actions against them and given the opportunity to appeal. I happen to be well versed in Go's CoC given that I was banned once before without notice &mdash; a ban which was later overturned on the grounds that the moderator was wrong in the first place. Great community, guys.