Whilst I am grateful that CF is helping websites fight back against this (often) unwanted and possibly illegal crawling, I think this comment should set off alarm bells:
> “In the future, we’ll continue to work to make these links harder to spot and make them fit seamlessly into the existing structure of the website they’re embedded in,” its authors wrote.
I understand that CF is has a good CA rep and that any such feature is likely opt-in, but something about legitimizing MITM-style link addition techniques to "protect against AI" seems counterproductive to a trust-free and decentralized internet. Especially given that individual small website owners probably don't have the tools/resources to win this arms race.
theamk 48 days ago [-]
I think "trust-free and decentralized internet" started to die with the first email spam message... we thought at least websites are safe but AI bots proved otherwise.
CF, if anything, is only helping small websites. Compared to alternatives (make a facebook page), the control is still in the owners' hands, and the service itself is pretty fungible - cloudflare could be replaced by something else or removed completely, and most users won't even notice.
more_corn 49 days ago [-]
His help us. We are all going to get trapped in this.
czk 49 days ago [-]
This sounds much more expensive than just serving the cached content you already have.
I would hate for a future where I have to double-take the content I'm reading to make sure CloudFlare didn't decide to confuse me with a bot and feed me AI generated slop.
The war on crawlers thing is noble in some sense, but at the end of the day I wonder if this just hurts the regular users more than anyone else. Gatekeeping, captchas, false-flagging real people as bots, and getting to decide who gets to see what content on the internet.
OpenAI can afford to work around this with their billions of dollars but the average guy will be fed AI generated slop from CF when he tries to script a requests one-liner in python.
theamk 48 days ago [-]
The AI slop is definitely less expensive for the author, as all the efforts are expended by CF's infrastructure, while the host sees zero requests.
In the previous discussions, one of the big AI problems was that they follow every link [0], including expensive ones like "git blame" view.. as well as "every commit of every log", which can be easily in millions if you have mirrors of few repos with many branches. It's simply impossible to cache this all at CloudFlare, and solution like AI maze (or outright blocking) seem like the only way.
I feel you about "requests one-liners in python" - I am still running a few of those in crontab on some server.. but their time might be past. The "average guy" does not care about this kind of scripting, so we will get lumped together with evil AI and blocked.
Yeah you are right about the caching thing, I understand what CF is doing I just cant help but feel like there is something deeply wrong about this approach. But I don't have any better ideas to propose. Maybe I'm just holding on too tight to the 'old web' that I've known and loved.
Discordian93 49 days ago [-]
The Blackwall lol
49 days ago [-]
Rendered at 07:00:26 GMT+0000 (UTC) with Wasmer Edge.
> “In the future, we’ll continue to work to make these links harder to spot and make them fit seamlessly into the existing structure of the website they’re embedded in,” its authors wrote.
I understand that CF is has a good CA rep and that any such feature is likely opt-in, but something about legitimizing MITM-style link addition techniques to "protect against AI" seems counterproductive to a trust-free and decentralized internet. Especially given that individual small website owners probably don't have the tools/resources to win this arms race.
CF, if anything, is only helping small websites. Compared to alternatives (make a facebook page), the control is still in the owners' hands, and the service itself is pretty fungible - cloudflare could be replaced by something else or removed completely, and most users won't even notice.
I would hate for a future where I have to double-take the content I'm reading to make sure CloudFlare didn't decide to confuse me with a bot and feed me AI generated slop.
The war on crawlers thing is noble in some sense, but at the end of the day I wonder if this just hurts the regular users more than anyone else. Gatekeeping, captchas, false-flagging real people as bots, and getting to decide who gets to see what content on the internet.
OpenAI can afford to work around this with their billions of dollars but the average guy will be fed AI generated slop from CF when he tries to script a requests one-liner in python.
In the previous discussions, one of the big AI problems was that they follow every link [0], including expensive ones like "git blame" view.. as well as "every commit of every log", which can be easily in millions if you have mirrors of few repos with many branches. It's simply impossible to cache this all at CloudFlare, and solution like AI maze (or outright blocking) seem like the only way.
I feel you about "requests one-liners in python" - I am still running a few of those in crontab on some server.. but their time might be past. The "average guy" does not care about this kind of scripting, so we will get lumped together with evil AI and blocked.
[0] https://drewdevault.com/2025/03/17/2025-03-17-Stop-externali...