logo

drewdevault.com

[mirror] blog and personal website of Drew DeVault git clone https://hacktivis.me/git/mirror/drewdevault.com.git
commit: 8dc43e8fb5b8a1d40a72761f10d89a7f663dfe32
parent e10babea47a9c8af9990667ed42f7e5b5583ddb4
Author: Drew DeVault <drew@ddevault.org>
Date:   Tue, 18 Mar 2025 10:25:20 +0100

Please stop externalizing your costs directly into my face

Diffstat:

Acontent/blog/2025-03-17-Stop-externalizing-your-costs-on-me.md79+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 79 insertions(+), 0 deletions(-)

diff --git a/content/blog/2025-03-17-Stop-externalizing-your-costs-on-me.md b/content/blog/2025-03-17-Stop-externalizing-your-costs-on-me.md @@ -0,0 +1,79 @@ +--- +title: Please stop externalizing your costs directly into my face +date: 2025-03-17 +--- + +Over the past few months, instead of working on our priorities at SourceHut, I +have spent anywhere from 20-100% of my time in any given week mitigating +hyper-aggressive LLM crawlers at scale. This isn't the first time SourceHut has +been at the wrong end of some malicious bullshit or paid someone else's +externalized costs -- every couple of years someone invents a new way of ruining +my day. + +Four years ago, we decided to [require payment to use our CI services][0] +because it was being abused to mine cryptocurrency. We alternated between +periods of designing and deploying tools to curb this abuse and periods of +near-complete outage when they adapted to our mitigations and saturated all of +our compute with miners seeking a profit. It was bad enough having to beg my +friends and family to avoid "investing" in the scam without having the scam +break into my business and trash the place every day. + +[0]: https://man.sr.ht/ops/builds.sr.ht-migration.md + +Two years ago, we threatened to [blacklist the Go module mirror][1] because for +some reason the Go team thinks that running terabytes of git clones all day, +every day for every Go project on git.sr.ht is cheaper than maintaining any +state or using webhooks or coordinating the work between instances or even just +designing a module system that doesn't require Google to DoS git forges whose +entire annual budgets are considerably smaller than a single Google engineer's +salary. + +[1]: https://sourcehut.org/blog/2023-01-09-gomodulemirror/ +[jj]: https://github.com/jj-vcs/jj/discussions/4849 + +Now it's LLMs. If you think these crawlers respect robots.txt then you are +several assumptions of good faith removed from reality. These bots crawl +everything they can find, robots.txt be damned, including expensive endpoints +like git blame, and they do so using random User-Agents that overlap with +end-users and come from tens of thousands of IP addresses -- mostly residential, +in unrelated subnets, each one making no more than one HTTP request over any +time period we tried to measure -- actively and maliciously adapting and +blending in with end-user traffic and avoiding attempts to characterize their +behavior or block their traffic. + +We are experiencing dozens of brief outages per week, and I have to review our +mitigations several times per day to keep that number from getting any higher. +When I do have time to work on something else, often I have to drop it when all +of our alarms go off because our current set of mitigations stopped working. +Several high-priority tasks at SourceHut have been delayed weeks or even months +because we keep being interrupted to deal with these bots, and many users have +been negatively affected because our mitigations can't always reliably +distinguish users from bots. + +All of my sysadmin friends are dealing with the same problems. I was asking one +of them for feedback on a draft of this article and our discussion was +interrupted to go deal with a new wave of LLM bots on their own server. Every +time I sit down for beers or dinner or to socialize with my sysadmin friends +it's not long before we're complaining about the bots and asking if the other +has cracked the code to getting rid of them once and for all. The desperation in +these conversations is palpable. + +Whether it's cryptocurrency scammers mining with FOSS compute resources or +Google engineers too lazy to design their software properly or Silicon Valley +ripping off all the data they can get their hands on at everyone else's expense… +I am sick and tired of having all of these costs externalized directly into my +fucking face. Do something productive for society or get the hell away from my +servers. Put all of those billions and billions of dollars towards the common +good before sysadmins collectively start a revolution to do it for you. + +Please stop legitimizing LLMs or AI image generators or GitHub Copilot or any of +this garbage. I am begging you to stop using them, stop talking about them, stop +making new ones, just *stop*. If blasting CO<sub>2</sub> into the air and +ruining all of our freshwater and traumatizing cheap laborers and making every +sysadmin you know miserable and ripping off code and books and art at scale and +ruining our fucking democracy isn't enough for you to leave this shit alone, +what is? + +If you personally work on developing LLMs et al, know this: I will never work +with you again, and I will remember which side you picked when the bubble +bursts.