logo

drewdevault.com

[mirror] blog and personal website of Drew DeVault git clone https://hacktivis.me/git/mirror/drewdevault.com.git

2025-03-17-Stop-externalizing-your-costs-on-me.md (4728B)


  1. ---
  2. title: Please stop externalizing your costs directly into my face
  3. date: 2025-03-17
  4. ---
  5. *This blog post is expressing personal experiences and opinions and doesn't
  6. reflect any official policies of SourceHut.*
  7. Over the past few months, instead of working on our priorities at SourceHut, I
  8. have spent anywhere from 20-100% of my time in any given week mitigating
  9. hyper-aggressive LLM crawlers at scale. This isn't the first time SourceHut has
  10. been at the wrong end of some malicious bullshit or paid someone else's
  11. externalized costs -- every couple of years someone invents a new way of ruining
  12. my day.
  13. Four years ago, we decided to [require payment to use our CI services][0]
  14. because it was being abused to mine cryptocurrency. We alternated between
  15. periods of designing and deploying tools to curb this abuse and periods of
  16. near-complete outage when they adapted to our mitigations and saturated all of
  17. our compute with miners seeking a profit. It was bad enough having to beg my
  18. friends and family to avoid "investing" in the scam without having the scam
  19. break into my business and trash the place every day.
  20. [0]: https://man.sr.ht/ops/builds.sr.ht-migration.md
  21. Two years ago, we threatened to [blacklist the Go module mirror][1] because for
  22. some reason the Go team thinks that running terabytes of git clones all day,
  23. every day for every Go project on git.sr.ht is cheaper than maintaining any
  24. state or using webhooks or coordinating the work between instances or even just
  25. designing a module system that doesn't require Google to DoS git forges whose
  26. entire annual budgets are considerably smaller than a single Google engineer's
  27. salary.
  28. [1]: https://sourcehut.org/blog/2023-01-09-gomodulemirror/
  29. [jj]: https://github.com/jj-vcs/jj/discussions/4849
  30. Now it's LLMs. If you think these crawlers respect robots.txt then you are
  31. several assumptions of good faith removed from reality. These bots crawl
  32. everything they can find, robots.txt be damned, including expensive endpoints
  33. like git blame, every page of every git log, and every commit in every repo, and
  34. they do so using random User-Agents that overlap with end-users and come from
  35. tens of thousands of IP addresses -- mostly residential, in unrelated subnets,
  36. each one making no more than one HTTP request over any time period we tried to
  37. measure -- actively and maliciously adapting and blending in with end-user
  38. traffic and avoiding attempts to characterize their behavior or block their
  39. traffic.
  40. We are experiencing dozens of brief outages per week, and I have to review our
  41. mitigations several times per day to keep that number from getting any higher.
  42. When I do have time to work on something else, often I have to drop it when all
  43. of our alarms go off because our current set of mitigations stopped working.
  44. Several high-priority tasks at SourceHut have been delayed weeks or even months
  45. because we keep being interrupted to deal with these bots, and many users have
  46. been negatively affected because our mitigations can't always reliably
  47. distinguish users from bots.
  48. All of my sysadmin friends are dealing with the same problems. I was asking one
  49. of them for feedback on a draft of this article and our discussion was
  50. interrupted to go deal with a new wave of LLM bots on their own server. Every
  51. time I sit down for beers or dinner or to socialize with my sysadmin friends
  52. it's not long before we're complaining about the bots and asking if the other
  53. has cracked the code to getting rid of them once and for all. The desperation in
  54. these conversations is palpable.
  55. Whether it's cryptocurrency scammers mining with FOSS compute resources or
  56. Google engineers too lazy to design their software properly or Silicon Valley
  57. ripping off all the data they can get their hands on at everyone else's expense…
  58. I am sick and tired of having all of these costs externalized directly into my
  59. fucking face. Do something productive for society or get the hell away from my
  60. servers. Put all of those billions and billions of dollars towards the common
  61. good before sysadmins collectively start a revolution to do it for you.
  62. Please stop legitimizing LLMs or AI image generators or GitHub Copilot or any of
  63. this garbage. I am begging you to stop using them, stop talking about them, stop
  64. making new ones, just *stop*. If blasting CO<sub>2</sub> into the air and
  65. ruining all of our freshwater and traumatizing cheap laborers and making every
  66. sysadmin you know miserable and ripping off code and books and art at scale and
  67. ruining our fucking democracy isn't enough for you to leave this shit alone,
  68. what is?
  69. If you personally work on developing LLMs et al, know this: I will never work
  70. with you again, and I will remember which side you picked when the bubble
  71. bursts.