I always wondered who detects when downdetector.com is down.
rolandog 136 days ago [-]
Each downed page diminishes me,
for I am involved in WANkind.
Therefore, ping not to know
for whom the downdetector detects,
it detects for thee.
AmbroseBierce 136 days ago [-]
Snark detectors are all the way up tho
seydor 136 days ago [-]
Right now the order is reversed.
Look, i think we need a resilient system that routes packets via multiple possible pathways, preferably all of them, so that ideally nothing is ever fully down. We can name that system the undownnet
Poudlardo 136 days ago [-]
haha this website actually is useful
lopatin 136 days ago [-]
I think it's down though
dlillard0 136 days ago [-]
[dead]
justmarc 136 days ago [-]
Down detector slop
loopdoend 136 days ago [-]
No idea why the media always relies on/cites this useless site...
dtf 136 days ago [-]
Edinburgh Airport is also down, suspending all flights after an "IT issue with our air traffic control provider". Not sure if this is coincidental, but the timing is rather suspicious!
johncolanduoni 136 days ago [-]
Maybe the little plane icons on the ATC screens are a PNG hosted on some Cloudflare domain.
AmbroseBierce 136 days ago [-]
They laughed at my base64 encoded icons, now there, enjoy your downtime.
Nextgrid 136 days ago [-]
Not a safety-critical system but I know passenger information screens in at least some airports are just full-screen browsers displaying a SaaS-hosted webpage.
Popeyes 136 days ago [-]
Fontawesome?
Maxion 136 days ago [-]
minified js depependency loaded from some CDN?
huflungdung 136 days ago [-]
[dead]
hoherd 136 days ago [-]
These days planes got problems with all kinds of clouds.
hexbin010 136 days ago [-]
BBC reporting that the airport stated it was unrelated
jan_Sate 136 days ago [-]
That's odd. Why would you want to rely on Cloudflare for airport stuffs?
ErroneousBosh 136 days ago [-]
Annoyingly I wanted to fly a parcel from Edinburgh up to Stornoway, but it's looking like I'd be quicker driving the seven hours up to the ferry terminal myself.
7-Zark-7 136 days ago [-]
The old guard has left as they we too much of an expense in this cost-cutting age... without mentors, crap creeps in and now we are seeing what happens when people don't know how things work, are in charge...
__turbobrew__ 136 days ago [-]
It may not be true everywhere, but at my company we 100% had more SEVs after two rounds of RIFs. We are talking simple statistics of SEVs per month plotted against RIFs.
bflesch 136 days ago [-]
Why not build the next cloudflare then, I'm sure it is appreciated by HN folks.
ai-christianson 136 days ago [-]
Or maybe we just move away from cloudflare-like services altogether.
bflesch 136 days ago [-]
Ideally, yes. Maybe someone can build a CDN on top of uncloud.
smallerize 136 days ago [-]
And just live with high latency?
jsheard 136 days ago [-]
The website you're using right now is hosted from a single location without any kind of CDN in front, so unless by coincidence you happen to live next door then you seem to be managing. CDNs do help, but just not bundling 40MB of Javascript or doing 50 roundtrips to load a page can go a long way.
stravant 136 days ago [-]
The website you're using right now is possibly the single least want of a CDN out of any of the sites I regularly visit.
What other popular site has zero images or video to speak of?
bflesch 136 days ago [-]
What is "high latency" nowadays? If people wouldn't bundle 30mb into every html page it wouldn't be needed.
Also cloudflare is needed due to DDOS and abuse from rogue actors, which are mostly located in specific areas. Residential IP ranges in democratic countries are not causing the issues.
ptidhomme 136 days ago [-]
Aren't botnet targeting cheap and unsecured consumer devices specifically in the residential IP ranges ?
bflesch 136 days ago [-]
Of course they are, but these botnets are actively combated by the ISPs.
The main bad traffic that I receive comes from server IP ranges all over the world and several rogue countries who think it makes sense to wage hybrid war against us. But residential IP ranges are not the majority of bad traffic.
I would even say that residential IP ranges are most of the paying customers for companies, and if you just block everything else you most likely wouldn't need to use cloudflare.
Unfortunately firewall technology is not there yet. It's quite hard to block entire countries, even harder to block any non-residential ASN. And then you can still add some open source "i am human" captcha solution before you need to use cloudflare.
steve1977 136 days ago [-]
That stupid Cloudflare check page often adds latency in orders of magnitude compared to what a few thousand miles of cables would. Also most applications and websites are not that sensitive to latency anyway, at least when done properly.
cess11 136 days ago [-]
Sure, why not?
simultsop 136 days ago [-]
To mars data center right?
dontlaugh 136 days ago [-]
With what capital exactly?
cowsandmilk 136 days ago [-]
I’m sick of people saying this. The truth at every cloud provider:
1. There were outages under the old guard.
2. The new guard is operating systems that are larger than what the old guard operated.
gosub100 136 days ago [-]
You don't think companies try to save costs by hiring the cheapest and dumbest people?
kylecazar 136 days ago [-]
I don't think that's the exact mechanism, no.
They might go on a hiring freeze, cancel a role, or in some cases pass on someone asking too much... But I don't think any major players are actively out trawling for "cheap and dumb". Certainly not Cloudflare, AWS and Google.
They haven't had an incident that bad since they switched from C to Rust.
antoinealb 136 days ago [-]
The linked article is precisely about how in 2024 they started rewriting their proxy layer from nginx (a C app). While "They haven't had an incident that bad since they switched from C to Rust." might be true, it has also been almost 9 years since cloudbleed, of which 8 were in C world.
noosphr 136 days ago [-]
Yes, it's been a great two months.
136 days ago [-]
winternewt 136 days ago [-]
What are you implying by linking to that article?
RamRodification 136 days ago [-]
I'm not the person you are replying to, but like all of technology, you just find the latest (or most public) change made, and then fire your blame-cannon at it.
Excel crashed? Must be that new WiFi they installed!
ErroneousBosh 136 days ago [-]
"Ever since you replaced my wiper blades the clutch has been slipping"
amiga386 136 days ago [-]
In the chain of events that led to Cloudflare's largest ever outage, code they'd rewritten from C to Rust was significant factor. There are, of course, other factors that meant the Rust-based problem was not mitigated.
They expected a maximum config size but an upstream error meant it was much larger than normal. Their Rust code parsed a fraction of the config, then did ".unwrap()" and panicked, crashing the entire program.
This validated a number of things that programmers say in response to Rust advocates who relentlessly badger people in pursuit of mindshare and adoption:
* memory errors are not the only category of errors, or security flaws. A language claiming magic bullets for one thing might be nonetheless be worse at another thing.
* there is no guarantee that if you write in <latest hyped language> your code will have fewer errors. If anything, you'll add new errors during the rewrite
* Rust has footguns like any other language. If it gains common adoption, there will be doofus programmers using it too, just like the other languages. What will the errors of Rust doofuses look like, compared to C, C++, C#, Java, JavaScript, Python, Ruby, etc. doofuses?
* availability is orthagonal to security. While there is a huge interest in remaining secure, if you design for "and it remains secure because it stops as soon as there's an error", have you considered what negative effects a widespread outage would cause?
kortilla 136 days ago [-]
This is generally BS apologetics for C. If that was in C this would have just been overrunning the statically allocated memory amount and would have resulted in a segfault.
Rust did its job and forced them to return an error from the lower function. They explicitly called a function to crash if that returned an error.
That’s not a rust problem.
amiga386 136 days ago [-]
We don't know how the C program would have coped. It could equally have ignored the extra config once it reached its maximum, which would cause new problems but not necessarily cause an outage. It could've returned an error and safely shut down the whole program (which would result in the same problem as Rust panicking).
What we do know is Cloudflare wrote a new program in Rust, and never tested their Rust program with too many config items.
You can't say "Rust did its job" and blame the programmer, any more than I can say "C did its job" when a programmer tells it to write to the 257th index of a 256 byte array, or "Java did its job" when some deeply buried function throws a RuntimeException, or "Python did its job" when it crashes a service that has been running for years because for the first time someone created a file whose name wasn't valid UTF-8.
Footguns are universal. Every language has them, including Rust.
You have to own the total solution, no matter which language you pick. Switching languages does not absolve you of this. TANSTAAFL.
kortilla 135 days ago [-]
> You can't say "Rust did its job" and blame the programmer,
You absolutely can. This is someone just calling panic in an error branch. Rust didn’t overrun the memory which would have been a real possibility here in C.
The whole point is that C could have failed in the exact same way but it would have taken extra effort to even get it to detect the issue an exit. For an error the programmer didn’t intend to handle like in this case, it likely would have just segfaulted because they wouldn’t bother to bounds check.
> TANSTAAFL
The way C could have failed here is a superset of how Rust would. Rust absolutely gives you free lunch, you just have to eat it.
stingraycharles 136 days ago [-]
“haha rust is bad” or something, is’s a silly take. these things hardly, if ever, are due to programming language choice and rather due to complicated interactions between different systems.
skywhopper 136 days ago [-]
Cloudflare was crowing that their services were better because “We write a lot of Rust, and we’ve gotten pretty good at it.”
The last outage was in fact partially due to a Rust panic because of some sloppy code.
Yes, these complex systems are way more complex than just which language they use. But Cloudflare is the one who made the oversimplified claim that using Rust would necessarily make their systems better. It’s not so simple.
EugeneOZ 136 days ago [-]
You can write sloppy code using any language.
136 days ago [-]
f311a 136 days ago [-]
> A change made to how Cloudflare's Web Application Firewall parses requests caused Cloudflare's network to be unavailable for several minutes this morning. This was not an attack; the change was deployed by our team to help mitigate the industry-wide vulnerability disclosed this week in React Server Components. We will share more information as we have it today.
tietjens 136 days ago [-]
does this mean we can blame React Server Components for something new?
osener 136 days ago [-]
Listen to the sound of HN hawks erupting with joy when they realize they can blame JS, React, RSC, Rust, Cloudflare, and the cloud all for one outage.
tietjens 136 days ago [-]
For those hawks, Christmas has come early.
johncolanduoni 136 days ago [-]
I always suspected RSC was actually a secret Facebook plan to sabotage the React ecosystem now that their competitors all use it to some degree. Now I’m convinced.
girvo 136 days ago [-]
I mean RSC wasn’t really even the FB folks as far as I remember, they barely control React anymore
__turbobrew__ 136 days ago [-]
When are they going to figure out that canary deployments are a good idea? Do they just push every change straight to prod globally?
That blog post made it to the front page of HN and my site did not go down. Nor did any DDoS network take the site out even though I also challenged them last time by commenting that I would be okay with a DDoS. I would figure out a way around it.
In general, marketing often works via fear, that's why Cloudflare has those blog posts talking about "largest botnet ever". Advertisement for medicine for example also works often via fear. "Take this or you die", essentially.
em500 136 days ago [-]
Yes, marketing often works via fear. And decision making in organizations often works through blame shifting and diffusion of accountability. So organizations will just stick with centralization and Cloudfare, AWS, Microsoft et al regardless of technical concerns.
peanut-walrus 136 days ago [-]
Cloudflare is widely used because it's the easiest way to run a website for free or expose local services to internet. I think for most cloudflare users, the ddos protection is not the main reason they're using it.
miyuru 136 days ago [-]
I am using cloudflare because the origin servers are IPv6 only.
bo1024 136 days ago [-]
Cloudflare hosts websites for free?
peanut-walrus 136 days ago [-]
Yup, the free plan is quite generous.
numlock86 136 days ago [-]
Yes, they have free plans.
YouAreWRONGtoo 136 days ago [-]
[dead]
Jsmith4523 136 days ago [-]
You know what, maybe AI is taking all the goddamn jobs
akKsbba 136 days ago [-]
They’re a global company that offshores with location based pay and utilizes H1Bs. I think that’s the first thing to look at. You get what you pay for.
Stop trying to devalue labor. Not much sympathy when you’re obviously cutting corners.
_fizz_buzz_ 136 days ago [-]
Just because someone is on an H1B visa doesn't mean they know less. It's a bit rich to blame this on foreign workers even though nothing is known about who or what caused this outage.
renegade-otter 136 days ago [-]
The problem with H1B is that these people are effectively prisoners. The market is not so hot right now even for those who have leverage, but combine it with the visa system and you get this "gotta do the needful" attitude to please the bosses, rushing broken fixes to production.
gosub100 136 days ago [-]
I see this directly on my team. The h1bs get bullied by their boss (it's a split team, I work with him but don't report to him) and they don't say anything because he could effectively have them deported. At least 2 of them have kids here and perhaps the others do. So not only does it incentivize the bully to do it, but it traps them to just take it for their family. I openly talk shit back to him because he can't deport me.
greenchair 136 days ago [-]
Knowledge + tech skills are not the only factor that lead to subpar outcomes with these scenarios. In my experience the thing that causes the most problems with H1Bs is the weak English and related communication issues.
codingdave 136 days ago [-]
In my experience, the communication problems stem from the Americans who expect perfect English from all others. English is spoken across the entire business world between people for whom it is not their first language. The accents and broken English is epic in many organizations. Yet they work through it and get things done together.
If you work harder at taking the burden upon yourself to understand others, you might be surprised how well people can learn to communicate despite differing backgrounds.
throwaway_x235 136 days ago [-]
I have the same experience as you. I have been working with many non-native English speakers from different countries, and Americans (and to some degree Brits) are usually the ones who can't follow what is said. This improves over time as they get used to different accents, but it seems it is easier for non-native speakers to understand foreign accents than native speakers in general.
I'm not saying I always understand 100% of what is said. When someone with an accent from a specific part of a country speaks super fast and is on a poor line with lots of street traffic in the background, it can be hard to follow. But usually I catch enough of it to be able to communicate.
gvurrdon 136 days ago [-]
Normally I have had very good experiences as well. My colleagues almost always speak very good English and even those who don't are understandable. Everyone is happy to conduct meetings (with many nationalities, as I work in a scientific field) in English.
Only once have I encountered a problem. A colleague berated me in front of others for speaking "difficult English" and accused me of doing this on purpose to cause trouble for them, instead of speaking proper international English like everyone else did. But, I am a native English speaker with an RP accent and we were all in England at the time, working for a British organisation. I was simply speaking normally and otherwise had no issue with this colleague, whose English was very good. I don't recall their having been any misunderstandings between us before.
kortilla 136 days ago [-]
That’s just saying the same thing. American companies have engineering quality loss when they try to collaborate with people they can’t communicate well with. Whether it’s dumb Americans or poor ESL, it’s not really relevant to the outcome because it’s the same.
akKsbba 136 days ago [-]
I’m blaming it on paying workers less. H1B, location based pay, offshoring, etc. are all ways to pay workers less.
franktankbank 136 days ago [-]
I think the implication is that people with connections who have no business touching production code are getting those jobs through fraud.
immibis 136 days ago [-]
They pretty much said this. All the big companies that had recent outages are companies that publicly embraced vibe coding.
GaryBluto 136 days ago [-]
In the 80s, a "series" of fires broke out and destroyed many homes and businesses in England, all of which having a print of a painting known as 'The Crying Boy'. The painting has ever since been rumoured to be haunted.
Obviously, 'The Crying Boy' was not the cause of the fires, it was just that most homes in the 80s England had that print, as it was a popular one, and people found a pattern where there wasn't one.
jasonvorhe 136 days ago [-]
causality, causation, yadda yadda. They already explained that it was some react server component update. sure, could've also been done with some ai assist but we don't know.
These companies also don't vibe code (which would involve just prompting without editing code yourself, at least that's the most common definition).
I really hope news like these won't be followed by comments like these (not criticism of you personally) until the AI hype dies down a bit. It's getting really tiresome to always read the same oversimplified takes every time there's some outage involving centralized entities such as cloudflare instead of talking about the elephant in the room, which is their attempt of doing MITM on the majority of internet users.
poszlem 136 days ago [-]
This ignores all the companies that publicly embraced vibe coding and did NOT have outages. Not a huge fan of vibe coding, but let's keep the populism to minimum here.
lxgr 136 days ago [-]
On top of that, humans are more than capable of causing high-impact outages as well. (It's easier with massive unforced centralization, of course.)
johncolanduoni 136 days ago [-]
All the big companies embraced vibe coding, so I’m not sure there was a natural experiment here.
ivanbalepin 136 days ago [-]
if this is referring to Cloudflare, they are not yet particularly known for any major non-sales layoffs, ai or not.
hdgvhicv 136 days ago [-]
And once again simple self hosted services remain up.
sneak 136 days ago [-]
No; a lot of people still put those behind cloudflare.
steve1977 136 days ago [-]
I would not call these simple and self-hosted then.
HackerThemAll 136 days ago [-]
There are too many of that recently. Cloudflare is starting to look amateurish. Can't they test their stuff properly before deploying it to production?
bflesch 136 days ago [-]
Maybe they test with shopify.com before the deploy it to their important customers ;)
stanislavb 136 days ago [-]
I "envy" DownDetector these days ... I wanna know how much money they are making out of these Cloudflare downs...
PascalStehling 136 days ago [-]
downdetectors downdetector shows that downdector should not be down. Something is wrong here.
downdetectorsdowndetector.com does not load the results as part of the HTML, nor does it do any API requests to retrieve the status. Instead, the obfuscated javascript code contains a `generateMockStatus()` function that has parts like `responseTimeMs: randomInt(...)` and a hardcoded `status: up` / `httpStatus: 200`. I didn't reverse-engineer the entire script, but based on it incorrectly showing downdetector.com as being up today, I'm pretty sure that downdetectorsdowndetector.com is just faking the results.
downdetectorsdowndetectorsdowndetector.com and downdetectorsdowndetectorsdowndetectorsdowndetector.com seem like they might be legit. One has the results in the HTML, the other fetches some JSON from a backend (`status4.php`).
There are other LLMs you can ask to be absolutely, 100% sure.
Maxion 136 days ago [-]
You're absolutely right – Here is a list of current SOTA models that you can try!
Would you want me to:
- Create a list of all LLM models released in the past few months
- Let you know why my existence means you can't afford RAM anymore
- Help you learn sustenance farming so that you can feed your family in the coming AI future?
FranklinMaillot 136 days ago [-]
Not sure if this is related, but has anyone seen their allowance used up unexpectedly fast? Had Claude Code Web showing service disruption warnings, and all of a sudden I'm at 92% usage.
I'm on the pro plan, only using Sonnet and Haiku. I almost never hit the 5-hour limit, let alone in less than 2 hours.
CGamesPlay 136 days ago [-]
Did you accidentally hit tab to turn on “always thinking”? It burns tokens much faster.
FranklinMaillot 136 days ago [-]
No, I found the issue. It went all in on unit tests and wrote way too many.
ericcurtin 136 days ago [-]
Time to use some local ai with Docker Model Runner :)
The entirety of shopify was down too for 30 minutes.
LightBug1 136 days ago [-]
Ok, at what point is "We use Cloudflare" going to be a supply-chain red marker?
At what point does the cost outweigh the benefit?
privera13 136 days ago [-]
Down of a System
ndsipa_pomu 136 days ago [-]
WAKE UP!
ractive 136 days ago [-]
Grab a brush and put a little makeup.
bobowzki 136 days ago [-]
I host my companys website on Cloudflare pages using Cloudflare's DNS. I don't want to move to 100% self hosting but I would like to have self hosted backup. Has anyone solved this?
skywhopper 136 days ago [-]
Having a self-hosted “backup” that is ready to go at any time means having a self-hosted server that’s always on, basically. There are lots of cheap colo or VM options out there. But the problem is going to be dealing with an outage… how do you switch DNS over when your DNS provider is down?
Well, one way is to use a different DNS provider than either of your hosting options.
You can see this is getting complicated. Might be better to take the downtime.
But if I had to make a real recommendation I’m not aware of any time in the last decade that a static site deployed on AWS S3/Cloudfront would have actually been unavailable.
amiga386 136 days ago [-]
> how do you switch DNS over when your DNS provider is down?
You list multiple nameservers.
yoursite.com. 86400 IN NS ns1.yourprovider.com.
yoursite.com. 86400 IN NS ns2.yourprovider.com.
yoursite.com. 86400 IN NS ns1.yourotherprovider.com.
yoursite.com. 86400 IN NS ns2.yourotherprovider.com.
jacquesm 136 days ago [-]
Neat trick: just do job interviews when Cloudflare is down...
136 days ago [-]
willswire 136 days ago [-]
Woke up this morning with my iPhone and Apple Watch suddenly in a different time zone. Anyone else experience this?
testplzignore 136 days ago [-]
Are you by chance on an airplane?
dev_l1x_be 136 days ago [-]
Instead of figuring out a novel way of distributing content a stateful way with security and redundancy in mind we have created the current centralised monstrosity that we call the modern web. ¯\_(ツ)_/¯
Vivianfromtheo 136 days ago [-]
Crunchyroll down too got me and the anime community stressed
A_D_E_P_T 136 days ago [-]
If Crunchyroll is down for 30 minutes it's nbd, because you know they'll be back. If the pirate sites are down for any duration, it can be very stressful, because they can be gone for good.
thenthenthen 136 days ago [-]
We seriously need to start to be thinking of Up-detectors
borplk 136 days ago [-]
Incoming "Look at all this cool postmortem stuff about our fuckup" blog post. It's getting a bit old guys.
136 days ago [-]
tonyhart7 136 days ago [-]
You know its bad when DownDetector is also down
B4n4n4 136 days ago [-]
LinkedIn down
jacquesm 136 days ago [-]
That's a net positive then.
echelon 136 days ago [-]
Appears to be fixed now. Just lost 30 minutes of working.
If this is unwrap() again, we need to have a talk about Rust panic safety.
jasmes 136 days ago [-]
Time to rewrite Rust’s unwrap() in Rust obviously.
Maxion 136 days ago [-]
Does it make it worse or better if I say it's RSC?
Well, technically RSC was messed up, and then the hotfix for the messed up RSC was itself messed up. I guess there’s a lot of blame to go around.
jacquesm 136 days ago [-]
Now multiply that 'just' by the number of people affected.
bflesch 136 days ago [-]
How are these clowns deploying stuff on a Friday, it is unbelievable to me. It is not even funny any more. It seems cloudflare is held together by marketing only. They should stop all of these stupid initiatives and keep their stack simple.
And I'm 100% sure the management responsible for this is already fueling up the ferraris to drive to their beach house. All of us make them rich and they keep on enshittifying their product out of pure hubris.
throwaway_x235 136 days ago [-]
> How are these clowns deploying stuff on a Friday, it is unbelievable to me
I have stopped fighting this battle at work (not Cloudflare). Despite Friday being one of the most important days of the week for our customers, people still deploy the latest commit 10 minutes before they leave in the afternoon. Going on a weekend trip home to your family? No problem, just deploy and be offline for hours while you are traveling...
The response was that my way of thinking is "old school". Modern development is "fail fast" and that CI/CD with good tests and rollback fixes everything. Being afraid of deploys is "so last decade"... The problem is that our tests don't cover everything, it may not fail fast, and not all deploys can be rolled back quickly and the person who knows what their huge commit that touches multiple files actually does is unavailable!
We have had multiple issues with late afternoon deploys, but somehow we keep doing this. Funnily enough, I have noticed a pattern. Most experienced devs stops doing this after causing a couple of major downtimes, due to the massive backlash from customers while they are scrambling to fix the bug. So gradually they learn to deploy at less busy times and monitor the logs to be able to catch potential bugs early.
The problem is that not enough has learned this lesson because they have been lucky that their bugs have not been critical, or because they are too invested in their point of view to change. It seems that some individuals learn the hard way, but the organization has not learned or is reluctant to push for a change due to office politics. I decided to keep my head low and let things play out, as I simply no longer care as long as the management don't care either.
bflesch 136 days ago [-]
Yes. Especially they do the full checklist:
- Friday
- Christmas time
- Affecting both shopify.com and claude.ai, so no phased deployment
- Takes 30 minutes to remediate
If they would've just deployed to a single of their high-value customers at once, they could've spared shopify.com an hour of downtime and maybe millions in abandoned shopping carts.
pessimizer 136 days ago [-]
If you are a monopoly, there is no incentive to do anything well. You've saturated the market, the incentive is to cut costs.
In fact, there are incentives for public failures: they'll help the politicians that you bought sell the legislation that you wrote explaining how national security requires that the taxpayer write a check to your stockholders/owners in return for nothing.
heisenbit 136 days ago [-]
If the deployment was related to the React Server issue then maybe it was unavoidable.
bflesch 136 days ago [-]
Yes, but a hotfix was already in place. They chose to deploy the "proper fix" this morning, and obviously it went wrong. Also they didn't do a phased rollout because it impacted their high-value customers such as shopify as well as claude, causing significant damages. Their procedures are not good.
uyzstvqs 136 days ago [-]
Cloudflare's entire WAF depending on React is an issue in itself IMO.
bflesch 136 days ago [-]
good point, the bigger picture is even worse :)
steve1977 136 days ago [-]
Everything related to React is avoidable by not using React.
"A change made to how Cloudflare's Web Application Firewall parses requests caused Cloudflare's network to be unavailable for several minutes this morning. This was not an attack; the change was deployed by our team to help mitigate the industry-wide vulnerability disclosed this week in React Server Components."
The bug is known since several days, and the hotfix was already in place. So they worked on the "final fix" and chose to deploy it on a friday morning.
poulpy123 136 days ago [-]
what about downdowndetectordetector ?
136 days ago [-]
noosphr 136 days ago [-]
Another dozen or so of these and the self mutilation that teach companies have engaged in the last few years with mass lay-offs should finally end.
Extrapolating at current rates I guess that means April 2026.
Maksadbek 136 days ago [-]
Do we have downdectector for downdetector ?
136 days ago [-]
jtrn 136 days ago [-]
I have a tentative take, and I kind of feel stupid for even claiming this, since I don't work in Cloud-ops, or whatever, but it's fun to try to participate, and I spent some time articulating what i think is a good perspective on Cloudflare now a days, and as psychologist, I am primary interested in the psychology of things.
Basically, my take is: It’s not a technical monoculture; it’s a billing psychology + inertia culture.
I dont think the internet is fragile simply because Cloudflare is so ubiquitous, because that view ignores the economic factor of why people choose them. The situation is really a perfect bi-modal distribution: at the low end, you have hobbyists and personal sites who use Cloudflare because it is the only viable free option, and at the extreme high end, you have massive enterprises that truly need that specific global capacity to scrub terabits of attack traffic.
However, I think the following perspective is important: For the vast middle ground of the internet—most standard businesses and SaaS platforms—Cloudflare could be viewed as redundant. If you are hosting on AWS, Google Cloud, or Azure, you are already sitting behind world-class infrastructure protection that rivals anything Cloudflare offers. The reason this feels like a dangerous monoculture isn't because Google or Amazon can't protect you, but rather because Cloudflare wins on the psychology of billing. They sell a flat-rate insurance policy against attacks, whereas the cloud giants charge for usage, which scares people.
Ultimately, the internet isn't suffering from a lack of technical alternatives to DDoS protection, nor is Cloudflare a NECESSARY single point of failure; it is just suffering from a market preference for predictable invoices over technical redundancy, and inertia, leading to an extremely high usage of Cloudflare.
So basically: Even though we are currently relying a lot on Cloudflare, we are far from vendor lock-in, and there is a clear path to live without them, given that there are many alternatives.
Maybe we could view this as a good thing, since basically medium to large-scale enterprises efficiently subsidize small and hobby-level actors?
So to summerize: The 2018-era "just use Cloudflare for everything" advice is outdated, and the following is a better philosopy:
If you're tiny: Cloudflare free tier is still a no-brainer.
If you're huge and actually get attacked: pay for Cloudflare Enterprise or equivalent.
If you're anywhere in between: seriously consider whether you need it at all. The hyperscalers are good enough, and removing Cloudflare can actually improve your availability (fewer moving parts).
I think Cloudflare thinks this way too, which is why they've been pushing Zero Trust, Workers, WARP, Access, and Magic Transit, to become the default network stack for companies, not just the default firewall.
/wall-of-text
jtrn 136 days ago [-]
Bah, I think I double-posted. Is this visible? :o
pappya_coder 136 days ago [-]
Yes
Folyd 136 days ago [-]
i can confirm it down again
jasonsgt 136 days ago [-]
[dead]
RockRobotRock 136 days ago [-]
[flagged]
JimmaDaRustla 136 days ago [-]
This made getting paged at 4am worth it
sebmellen 136 days ago [-]
Me too man
136 days ago [-]
Rendered at 11:37:42 GMT+0000 (UTC) with Wasmer Edge.
Look, i think we need a resilient system that routes packets via multiple possible pathways, preferably all of them, so that ideally nothing is ever fully down. We can name that system the undownnet
What other popular site has zero images or video to speak of?
Also cloudflare is needed due to DDOS and abuse from rogue actors, which are mostly located in specific areas. Residential IP ranges in democratic countries are not causing the issues.
The main bad traffic that I receive comes from server IP ranges all over the world and several rogue countries who think it makes sense to wage hybrid war against us. But residential IP ranges are not the majority of bad traffic.
I would even say that residential IP ranges are most of the paying customers for companies, and if you just block everything else you most likely wouldn't need to use cloudflare.
Unfortunately firewall technology is not there yet. It's quite hard to block entire countries, even harder to block any non-residential ASN. And then you can still add some open source "i am human" captcha solution before you need to use cloudflare.
1. There were outages under the old guard.
2. The new guard is operating systems that are larger than what the old guard operated.
They might go on a hiring freeze, cancel a role, or in some cases pass on someone asking too much... But I don't think any major players are actively out trawling for "cheap and dumb". Certainly not Cloudflare, AWS and Google.
They haven't had an incident that bad since they switched from C to Rust.
Excel crashed? Must be that new WiFi they installed!
They expected a maximum config size but an upstream error meant it was much larger than normal. Their Rust code parsed a fraction of the config, then did ".unwrap()" and panicked, crashing the entire program.
This validated a number of things that programmers say in response to Rust advocates who relentlessly badger people in pursuit of mindshare and adoption:
* memory errors are not the only category of errors, or security flaws. A language claiming magic bullets for one thing might be nonetheless be worse at another thing.
* there is no guarantee that if you write in <latest hyped language> your code will have fewer errors. If anything, you'll add new errors during the rewrite
* Rust has footguns like any other language. If it gains common adoption, there will be doofus programmers using it too, just like the other languages. What will the errors of Rust doofuses look like, compared to C, C++, C#, Java, JavaScript, Python, Ruby, etc. doofuses?
* availability is orthagonal to security. While there is a huge interest in remaining secure, if you design for "and it remains secure because it stops as soon as there's an error", have you considered what negative effects a widespread outage would cause?
Rust did its job and forced them to return an error from the lower function. They explicitly called a function to crash if that returned an error.
That’s not a rust problem.
What we do know is Cloudflare wrote a new program in Rust, and never tested their Rust program with too many config items.
You can't say "Rust did its job" and blame the programmer, any more than I can say "C did its job" when a programmer tells it to write to the 257th index of a 256 byte array, or "Java did its job" when some deeply buried function throws a RuntimeException, or "Python did its job" when it crashes a service that has been running for years because for the first time someone created a file whose name wasn't valid UTF-8.
Footguns are universal. Every language has them, including Rust.
You have to own the total solution, no matter which language you pick. Switching languages does not absolve you of this. TANSTAAFL.
You absolutely can. This is someone just calling panic in an error branch. Rust didn’t overrun the memory which would have been a real possibility here in C.
The whole point is that C could have failed in the exact same way but it would have taken extra effort to even get it to detect the issue an exit. For an error the programmer didn’t intend to handle like in this case, it likely would have just segfaulted because they wouldn’t bother to bounds check.
> TANSTAAFL
The way C could have failed here is a superset of how Rust would. Rust absolutely gives you free lunch, you just have to eat it.
The last outage was in fact partially due to a Rust panic because of some sloppy code.
Yes, these complex systems are way more complex than just which language they use. But Cloudflare is the one who made the oversimplified claim that using Rust would necessarily make their systems better. It’s not so simple.
Some interesting DNS data https://news.ycombinator.com/item?id=46159249
That blog post made it to the front page of HN and my site did not go down. Nor did any DDoS network take the site out even though I also challenged them last time by commenting that I would be okay with a DDoS. I would figure out a way around it.
In general, marketing often works via fear, that's why Cloudflare has those blog posts talking about "largest botnet ever". Advertisement for medicine for example also works often via fear. "Take this or you die", essentially.
Stop trying to devalue labor. Not much sympathy when you’re obviously cutting corners.
If you work harder at taking the burden upon yourself to understand others, you might be surprised how well people can learn to communicate despite differing backgrounds.
I'm not saying I always understand 100% of what is said. When someone with an accent from a specific part of a country speaks super fast and is on a poor line with lots of street traffic in the background, it can be hard to follow. But usually I catch enough of it to be able to communicate.
Only once have I encountered a problem. A colleague berated me in front of others for speaking "difficult English" and accused me of doing this on purpose to cause trouble for them, instead of speaking proper international English like everyone else did. But, I am a native English speaker with an RP accent and we were all in England at the time, working for a British organisation. I was simply speaking normally and otherwise had no issue with this colleague, whose English was very good. I don't recall their having been any misunderstandings between us before.
Obviously, 'The Crying Boy' was not the cause of the fires, it was just that most homes in the 80s England had that print, as it was a popular one, and people found a pattern where there wasn't one.
These companies also don't vibe code (which would involve just prompting without editing code yourself, at least that's the most common definition).
I really hope news like these won't be followed by comments like these (not criticism of you personally) until the AI hype dies down a bit. It's getting really tiresome to always read the same oversimplified takes every time there's some outage involving centralized entities such as cloudflare instead of talking about the elephant in the room, which is their attempt of doing MITM on the majority of internet users.
https://downdetectorsdowndetector.com/
downdetectorsdowndetectorsdowndetector.com and downdetectorsdowndetectorsdowndetectorsdowndetector.com seem like they might be legit. One has the results in the HTML, the other fetches some JSON from a backend (`status4.php`).
Would you want me to:
- Create a list of all LLM models released in the past few months
- Let you know why my existence means you can't afford RAM anymore
- Help you learn sustenance farming so that you can feed your family in the coming AI future?
I'm on the pro plan, only using Sonnet and Haiku. I almost never hit the 5-hour limit, let alone in less than 2 hours.
No cloudflare no problem
https://github.com/docker/model-runner
At what point does the cost outweigh the benefit?
Well, one way is to use a different DNS provider than either of your hosting options.
You can see this is getting complicated. Might be better to take the downtime.
But if I had to make a real recommendation I’m not aware of any time in the last decade that a static site deployed on AWS S3/Cloudfront would have actually been unavailable.
You list multiple nameservers.
If this is unwrap() again, we need to have a talk about Rust panic safety.
https://www.cloudflarestatus.com/incidents/lfrm31y6sw9q
And I'm 100% sure the management responsible for this is already fueling up the ferraris to drive to their beach house. All of us make them rich and they keep on enshittifying their product out of pure hubris.
I have stopped fighting this battle at work (not Cloudflare). Despite Friday being one of the most important days of the week for our customers, people still deploy the latest commit 10 minutes before they leave in the afternoon. Going on a weekend trip home to your family? No problem, just deploy and be offline for hours while you are traveling...
The response was that my way of thinking is "old school". Modern development is "fail fast" and that CI/CD with good tests and rollback fixes everything. Being afraid of deploys is "so last decade"... The problem is that our tests don't cover everything, it may not fail fast, and not all deploys can be rolled back quickly and the person who knows what their huge commit that touches multiple files actually does is unavailable!
We have had multiple issues with late afternoon deploys, but somehow we keep doing this. Funnily enough, I have noticed a pattern. Most experienced devs stops doing this after causing a couple of major downtimes, due to the massive backlash from customers while they are scrambling to fix the bug. So gradually they learn to deploy at less busy times and monitor the logs to be able to catch potential bugs early.
The problem is that not enough has learned this lesson because they have been lucky that their bugs have not been critical, or because they are too invested in their point of view to change. It seems that some individuals learn the hard way, but the organization has not learned or is reluctant to push for a change due to office politics. I decided to keep my head low and let things play out, as I simply no longer care as long as the management don't care either.
- Friday
- Christmas time
- Affecting both shopify.com and claude.ai, so no phased deployment
- Takes 30 minutes to remediate
If they would've just deployed to a single of their high-value customers at once, they could've spared shopify.com an hour of downtime and maybe millions in abandoned shopping carts.
In fact, there are incentives for public failures: they'll help the politicians that you bought sell the legislation that you wrote explaining how national security requires that the taxpayer write a check to your stockholders/owners in return for nothing.
"A change made to how Cloudflare's Web Application Firewall parses requests caused Cloudflare's network to be unavailable for several minutes this morning. This was not an attack; the change was deployed by our team to help mitigate the industry-wide vulnerability disclosed this week in React Server Components."
The bug is known since several days, and the hotfix was already in place. So they worked on the "final fix" and chose to deploy it on a friday morning.
Extrapolating at current rates I guess that means April 2026.
Basically, my take is: It’s not a technical monoculture; it’s a billing psychology + inertia culture.
I dont think the internet is fragile simply because Cloudflare is so ubiquitous, because that view ignores the economic factor of why people choose them. The situation is really a perfect bi-modal distribution: at the low end, you have hobbyists and personal sites who use Cloudflare because it is the only viable free option, and at the extreme high end, you have massive enterprises that truly need that specific global capacity to scrub terabits of attack traffic.
However, I think the following perspective is important: For the vast middle ground of the internet—most standard businesses and SaaS platforms—Cloudflare could be viewed as redundant. If you are hosting on AWS, Google Cloud, or Azure, you are already sitting behind world-class infrastructure protection that rivals anything Cloudflare offers. The reason this feels like a dangerous monoculture isn't because Google or Amazon can't protect you, but rather because Cloudflare wins on the psychology of billing. They sell a flat-rate insurance policy against attacks, whereas the cloud giants charge for usage, which scares people.
Ultimately, the internet isn't suffering from a lack of technical alternatives to DDoS protection, nor is Cloudflare a NECESSARY single point of failure; it is just suffering from a market preference for predictable invoices over technical redundancy, and inertia, leading to an extremely high usage of Cloudflare. So basically: Even though we are currently relying a lot on Cloudflare, we are far from vendor lock-in, and there is a clear path to live without them, given that there are many alternatives.
Maybe we could view this as a good thing, since basically medium to large-scale enterprises efficiently subsidize small and hobby-level actors? So to summerize: The 2018-era "just use Cloudflare for everything" advice is outdated, and the following is a better philosopy: If you're tiny: Cloudflare free tier is still a no-brainer. If you're huge and actually get attacked: pay for Cloudflare Enterprise or equivalent.
If you're anywhere in between: seriously consider whether you need it at all. The hyperscalers are good enough, and removing Cloudflare can actually improve your availability (fewer moving parts).
I think Cloudflare thinks this way too, which is why they've been pushing Zero Trust, Workers, WARP, Access, and Magic Transit, to become the default network stack for companies, not just the default firewall.
/wall-of-text