commit: fc2956a4a33ed66485b7d92776853fa030c20b5b
parent a4c60a12a8efab1870864400915c2085f163ad5c
Author: Drew DeVault <drew@ddevault.org>
Date: Tue, 18 Mar 2025 10:55:09 +0100
Clarify affected pages
Diffstat:
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/content/blog/2025-03-17-Stop-externalizing-your-costs-on-me.md b/content/blog/2025-03-17-Stop-externalizing-your-costs-on-me.md
@@ -34,12 +34,13 @@ salary.
Now it's LLMs. If you think these crawlers respect robots.txt then you are
several assumptions of good faith removed from reality. These bots crawl
everything they can find, robots.txt be damned, including expensive endpoints
-like git blame, and they do so using random User-Agents that overlap with
-end-users and come from tens of thousands of IP addresses -- mostly residential,
-in unrelated subnets, each one making no more than one HTTP request over any
-time period we tried to measure -- actively and maliciously adapting and
-blending in with end-user traffic and avoiding attempts to characterize their
-behavior or block their traffic.
+like git blame, every page of every git log, and every commit in every repo, and
+they do so using random User-Agents that overlap with end-users and come from
+tens of thousands of IP addresses -- mostly residential, in unrelated subnets,
+each one making no more than one HTTP request over any time period we tried to
+measure -- actively and maliciously adapting and blending in with end-user
+traffic and avoiding attempts to characterize their behavior or block their
+traffic.
We are experiencing dozens of brief outages per week, and I have to review our
mitigations several times per day to keep that number from getting any higher.