Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲AI cybersecurity is not proof of work (antirez.com)

241 points by surprisetalk 12 days ago | 88 comments

rakejake 12 days ago [-]

>> Test it yourself, GPT 120B OSS is cheap and available. BTW, this is why with this bug, the stronger the model you pick (but not enough to discover the true bug), the less likely it is it will claim there is a bug.

I guess this is the crux of the debate. All the claims are comparing models that are available freely with a model that is available only to limited customers (Mythos). The problem here is with the phrase "better model". Better how? Is it trained specifically on cybersecurity? Is it simply a large model with a higher token/thinking budget? Is it a better harness/scaffold? Is it simply a better prompt?

I don't doubt that some models are stronger that other models (a Gemini Pro or a Claude Opus has more parameters, higher context sizes and probably trained for longer and on more data than their smaller counterparts (Flash and Sonnet respectively).

Unless we know the exact experimental setup (which in this case is impossible because Mythos is completely closed off and not even accessible via API), all of this is hand wavy. Anthropic is definitely not going to reveal their setup because whether or not there is any secret sauce, there is more value to letting people's imaginations fly and the marketing machine work. Anthropic must be jumping with joy at all the free publicity they are getting.

antirez 12 days ago [-]

In the Anthropic Mythos model cards they explicitly remarked that they didn't want Mythos to be specifically good at security. They trained it to be good at coding, and as a side effect the model is (obviously) good at security. This what happens with flesh hackers too, mostly. Hackers are very good programmers, as a side effect they understand systems well enough that their understanding has security implications.

Hendrikto 12 days ago [-]

Model cards are just marketing material. I wouldn’t trust them one bit.

antirez 12 days ago [-]

You don't need to trust anyone. GPT 5.4 xhigh is available and you can test it for $20, to verify it is actually able to find complex bugs in old codebases. Do the work instead of denying AI can do certain things. It's a matter of an afternoon. Or, trust the people that did this work. See my YouTube video where I find tons of Redis bugs with GPT 5.4.

Hendrikto 12 days ago [-]

I did not claim or deny anything. You cited the model card, I just pointed out that this is no reliable source. If you have better sources, like your YT video, you should cite those instead.

otterley 12 days ago [-]

You are claiming something: that the model card is not reliable, therefore it's as useful as nothing. Sowing doubt without a possible solution adds little value to the conversation. Moreover, your rebuttal is unsubstantiated.

cyanydeez 12 days ago [-]

Guys, think about all the security vulnerabilities you're aware of; now, think about how many of those you know how to technically reproduce. Now imagine that you actually don't know how to reproduce most things and you're never actually be able to judge the result.

Well, just cause these are all AI people doesn't mean they verified enough of the output of these models to actually provide the significant security implications they're advertising.

ncjfieuauahwi 12 days ago [-]

[dead]

mbesto 12 days ago [-]

And overfitting benchmarks can easily be gamed. Yet here we are with the top HN comment on the HN Mythos thread outlining it's benchmarking performance gains.

I guess we'll never learn.

Yokohiii 12 days ago [-]

The whole discussion started out as an attempt to disprove/verify anthropics (model card) claims.

He also transfers the logic of their claims to the actual real world. You can say that model cards are marketing garbage. You have to prove that experienced programmers are not significantly better at security.

root_axis 12 days ago [-]

> You have to prove that experienced programmers are not significantly better at security.

That has not been my experience. It's true that they are "better at security" in the sense that they know to avoid common security pitfalls like unparamaterized SQL, but essentially none of them have the ability to apply their knowledge to identify vulnerabilities in arbitrary systems.

Yokohiii 12 days ago [-]

An expert level human doesn't have to be expert at every programming category. A webdev wouldn't spot a use after free. A systems engineer wouldn't know about CSRF. That is if both don't research security beyond their field. Requiring a programmer to apply their knowledge to an arbitrary system is asking too much. On the other hand and LLM can be expert level in every programming field, able to spot and combine vulnerabilities creatively. That is all pretty hard and I don't think an security expert with vast knowledge would say "that's easy".

My point is that more experienced programmers are better at security on average, not that they are security experts.

tracker1 12 days ago [-]

I would think pwn2own competitions would signal the opposite. I'm consistently and often amazed at how a unique combination of exploits can bring a larger exploit and often in ways that most wouldn't even consider. I think it takes a level of knowledge, experience, creativity and paranoia to be really good with security issues all around as a person.

inetknght 12 days ago [-]

> essentially none of them have the ability to apply their knowledge to identify vulnerabilities in arbitrary systems.

I've found it to be the opposite. Many of them do have the ability to apply their knowledge in that fashion. They're just either not incentivised to do so, or incentivised to not do so.

2983592 12 days ago [-]

But they are treated as holy scripture ...

zahlman 12 days ago [-]

> Hackers are very good programmers

This does not match my experience.

ang_cire 12 days ago [-]

The missing part of their intended meaning is "skilled hackers". Unskilled hackers are everywhere, and they're bad at programming, but so are unskilled programmers.

rakejake 12 days ago [-]

>>> the model is (obviously) good at security

Out of curiosity, are you one of the people who has access to the model? If yes, could you write about your experimental setup in more detail?

_the_inflator 11 days ago [-]

Yep. Some are while others are more or less forum leeching and exploiting known risks and use tools.

But the some that really find certain bugs are really exceptional. Almost all are very hardware prolific and do assembler stuff. This alone is an impressive feast, I still enjoy 6510 and M68000 assembler here and there as a former scener who mainly coded demos and here and there improved games (so called trainers) or cracked few.

To be honest, the assembler guys scare me always because with it you can poke a whole in almost anything. No one in his sane mind uses assembler on x86 for professional development besides few special cases. But Python etc serve many MB of executable code for the abstraction and 20 bytes just kills it…

Glemllksdf 12 days ago [-]

If its really more expensive per token, it might have more parameters and is then able to hold more context/scope of code.

Rumors say it has 10 trillion parameter vs. 1 trillion.

rakejake 12 days ago [-]

https://internal_suspense_cache_hostname.local/v1/suspense-cache/25dbb1d742299b523e1b948071517f6fdb19b7b53c2edc8554a2e7e3d942d34dh: 280 ðÚïhttps://internal_suspense_cache_hostname.local/v1/suspense-cache/25dbb1d742299b523e1b948071517f6fdb19b7b53c2edc8554a2e7e3d942d34d>ã\µ>?í0@í3à0æ@1æ@1æf their iàUeÿÿÿmean#àóÓ u#ôÓte hackers&quoSÿÿÿÿÿÿsIcÅ9Vv·h&ÿÿÿÿÿÿ<,ÿÿÿÿÿÿ¼µ>¼µ>PChttps://hacker-news.firebaseio.com/v0/item/47793422.jsonio#application/json_sus

solenoid0937 12 days ago [-]

Mythos isn't restricted for marketing purposes - that would be incredibly dumb because Anthropic would be giving up first mover advantage for next gen models.
It's restricted because it's genuinely good at finding vulnerabilities, and employees felt that it's not a good idea to give this capability to everyone without letting defenders front-run.
That's it. That's all there is to it. It is not some grand marketing play.

the_snooze 12 days ago [-]

>It's restricted because it's genuinely good at finding vulnerabilities, and employees felt that it's not a good idea to give this capability to everyone without letting defenders front-run.
It's a possibility, but it doesn't eliminate the possibility that it's hype. If these claims were indeed serious, they would submit it for independent analysis somewhere.
This isn't some crazy process. Defense contractors are required to submit their systems (secret sauce and all) for operational test and evaluation before they're fielded.

afthonos 12 days ago [-]

> If these claims were indeed serious, they would submit it for independent analysis somewhere.
They have. 40 different companies that have all committed resources to patching their systems based on vulnerabilities found by Mythos. One of them, Google, is a frontier AI lab that pointedly did not say that their own models have found similar vulnerabilities.
> Defense contractors are required to submit their systems (secret sauce and all) for operational test and evaluation before they're fielded.
Does this look something like having 40 separate companies look at the outputs of the system, deciding that it’s real and they should do something about it, and committing resources to it?
At some point, “cynicism” is another word for “lalala can’t hear you”.

jerf 12 days ago [-]

Another cross-check I've run is, are the claims Anthropic is making for Mythos that out of line with the current status of AI coding assistents?
To which my answer is clearly, no, not even remotely. If Anthropic is outright lying about what Mythos can do, someone else will have it in a year.
In fact the security world would have to seriously consider the possibility that even if Mythos didn't exist that nation states have the equivalent in hand already. And of course, if Mythos does exist, nation states have it now. The odds that Antropic (and every other AI vendor) isn't penetrated enough by every major intelligence agency such that they have access to their choice of model approach zero.
I wonder about the overlap between people being skeptical of Mythos' capabilities, and those who are too skeptical of AI to have spent any time with it because they assume it can't be any good. If you are not aware of what frontier models routinely do, you may not realize that Mythos is just an evolution of existing capabilities, not a revolution. Even just taking a publicly-available frontier model, pointing it at a code base and telling it to "find the vulnerabilities and write exploits" produces disturbingly good results. I can see the weaknesses referenced by the Mythos numbers, especially around the actual writing of the exploits, but it's not like the current frontier models fall on their face and hallucinate wildly for this task. Most everything they produce when I try this is at least a "yeah, that's worth thinking about" rather than an instant dismissal.

rakejake 12 days ago [-]

Sure, I am not precluding the possibility that they've trained a genuinely great model. All I am saying is that the "this model better than that model" is moot when on one side you have model weights, and on the other side a whitepaper and some accompanying comments on the danger.
I'm not that old but have been here long enough that I remember when GPT-3 was considered too dangerous to release. Now you have models 10x as good, 1/10th the size and run on 8GB VRAM.

louiereederson 12 days ago [-]

I don't think you can say this with confidence, outside-in. It's not just about safety. The additional unknown is cost - I don't just mean API cost, but fully loaded cost for a given task. Is the model cost effective for tasks such that it has product market fit?
We don't yet know if Mythos was a level shift in the capability/cost frontier, or a continued extension of the same logarithmic capability/cost curve.

solenoid0937 12 days ago [-]

Some people have access to the model for red team purposes as part of Glasswing and they came away quite spooked according to what I heard

louiereederson 12 days ago [-]

I don't doubt it, I just mean the decision to release/not release generally may also be informed by the commercial/economic viability of the model for general usage patterns versus extremely high value patterns like vulnerability assessment

dmix 12 days ago [-]

That safety stuff is almost always quacks whose job it is to exaggerate LLMs at their non profits or marketing hype that "our models are so powerful you should fear them". Then they release them and the world moves on and adapts.
Mythos will benefit security in the long run more than hackers, if it can do what they claim. And there's nothing that will stop an LLM like it from being released in the near term so it's very likely just resource constraints or marketing

frank-romita 12 days ago [-]

Or, They created the illusion that it's restricted for security reasons but in reality they just lack the necessary for this to be used widespread!

jayd16 12 days ago [-]

If it wasn't marketing it wouldn't have fancy branding... It wouldn't even be announced.

zzzeek 12 days ago [-]

it seems likely it's both a better model to some unknown extent and doing this "we have to give it to the defenders first" thing is super great marketing material. it seems an entirely natural marketing campaign "announce that we can't even give the model to everyone at first, it's so great!", plus there's some truth to it, even better.
unless you are an employee at anthropic and shouldn't be talking about any of this at all, there's no way to know what the model's capabilities are.

2983592 12 days ago [-]

How do you know? If you have access you are not unbiased, otherwise you cannot know by definition.
AI companies routinely claim that something is too dangerous to release (I think GPT-2 was the first case) for marketing reasons. There are at least 10 documented high profile cases.
They keep it secret because they now sell to the MIC with China and North Korea bullshit stories as well as to companies who are invested in the AI hype themselves.

Glemllksdf 12 days ago [-]

I prefer a more cautios approach than the musk style were stuff gets fixed after.
And with gpt-2 the worry was mass emails a lot better and more detailed and personal, social media campaigns etc.
How many bots are deployed today on X and influencing democrazy around the globe?
Its fair to say it had an impact and LLMs still have.

SpicyLemonZest 12 days ago [-]

GPT-2 was obviously too dangerous to release at the time! It's OK-ish now, when the knowledge that AI can produce arbitrary text is widely shared. It would have been a disaster for scammers and phishers to get GPT-2 at a time when almost everyone still assumed that large volumes of detailed text proved there's a real human being on the other end of the conversation.

jayd16 12 days ago [-]

And, as we all know, humans can't be scammers. They need the robots to lie.

afthonos 12 days ago [-]

> How do you know? If you have access you are not unbiased, otherwise you cannot know by definition.
The platonic ideal of how to dismiss any argument by anyone about anything.

dwa3592 12 days ago [-]

Fighting over analogies is kind of pointless imo, but if you want me to indulge, here is what I will ask: Do you consider breadth first search better or depth first search better? - the good answer is it depends on the search surface. The same way bugs, vulnerabilities are hiding somewhere on the surface or inside(exploiting dependencies) the surface of the software.
In conclusion - Having a lot of tokens help! Having a better model also helps. Having both helps a lot. Having very intelligent humans + a lot of tokens + the best frontier models will help the most (emphasis on intelligent human).

kang 12 days ago [-]

maybe a human knowledgeable in the domain (the training) is better than a smart liguist-programmer.

neutered_knot 12 days ago [-]

It is also not proof of work because of asymmetries between attacker and defender. An attacker only needs to find one exploitable issue before the defender finds it and patches it, while the defender eventually needs to find all issues - and even then can't really be sure they remediated everything.
The defender also not only has to discover issues but get them deployed. Installing patches takes time, and once the patch is available, the attacker can use it to reverse engineer the exploit and use it attack unpatched systems. This is happening in a matter of hours these days, and AI can accelerate this.
It is also entirely possible that the defender will never create patches or users will never deploy patches to systems because it is not economically viable. Things like cheap IoT sensors can have vulnerabilities that don't get addressed because there is no profit in spending the tokens to find and fix flaws. Even if they were fixed, users might not know about patches or care to take the time to deploy them because they don't see it worth their time.
Yes, there are many major systems that do have the resources to do reviews and fix problems and deploy patches. But there is an enormous installed base of code that is going to be vulnerable for a long time.

zozbot234 12 days ago [-]

> It is also not proof of work because of asymmetries between attacker and defender. An attacker only needs to find one exploitable issue before the defender finds it and patches it, while the defender eventually needs to find all issues - and even then can't really be sure they remediated everything.
It depends. Some classes of vulnerabilities can be excluded by construction. This is usually seen as too hard to be practicable, but AI potentially changes this.

alex_young 12 days ago [-]

The whole framing is kind of uninteresting imo. If you spend more time researching code you can find more bugs to exploit / patch is not an earthshaking observation.
Adding the words “by Claude” to it doesn’t materially change it. One could also pay a few humans to do the same thing. People have done that for decades.

Glemllksdf 12 days ago [-]

It reduces the cost significantly.
A good security expert earns how much per year? And that person works 8/5.
Now you can just throw money at it.
CIA and co pay for sure more than 20k (thats what the anthropic red team stated as a cost for a complex exploit) for a zero day.
If someone builds some framework around this, you can literaly copy and paste it, throw money at it and scale it. This is not possible with a human.

eikenberry 12 days ago [-]

> It reduces the cost significantly.
> Now you can just throw money at it.
What happens when you throw enough money at it that it raises the cost significantly.

Glemllksdf 12 days ago [-]

But thats the thing, its already competitive and its not even released.
CIA and FBI and states easily pay 100k for a zero day.
Plenty of companies have security expert staff on file.
And it will become cheaper and easyer to use, fast.

i_think_so 12 days ago [-]

chef's kiss
Logged in just to show some love. +1 for the economics. +1 again (if I could) for the truth-to-power.
We need a lot more of this kind of multi-disciplinary skepticism to counterbalance the industrial grade rockstar ninja 10x Kool-Aid drinking.

drob518 12 days ago [-]

Right, but what is interesting is that you can buy it off the rack for the price of tokens. You don’t have to do a specialist search for a security expert, pay a recruiter, hire them, wait for the specialist to start, pay them a signing bonus, pay them an expert-level salary, pay their social security taxes, healthcare benefits, and finally pay for an exit package when you lay them off because the project got canceled. You buy tokens when you need them and you stop buying when you don’t. This was the same dynamic that made cloud computing more interesting than company-owned servers in a company-owned data center. It’s more responsive to business needs and it falls under the development expense budget, not payroll, so you can do it even during hiring freezes.

tracker1 12 days ago [-]

But, you do have to have at least an employee or contractor skilled enough to actually understand the scope of a given bug report from the agent in order to determine validity. I've seen plenty of legit bug reports by humans get dismissed because the reviewer didn't understand the material impact or how the bug/exploit worked.

drob518 12 days ago [-]

Yep, sure. So, maybe you hire one and not three. The point is, it’s going to be fewer. Of course, all that assumes the AI is actually as good as a human, which I’m still skeptical of.

TZubiri 12 days ago [-]

There's an actual word-phrase for this exact concept applied more broadly to Defense in general:
Arms race

pixl97 12 days ago [-]

This is the weirdest take I've seen.
It takes humans a very long time to learn how to code/find bugs. You just can't take any human and have them do it in a reasonable amount of time with a reasonable amount of money.
Claude is effectively automation, once you have the hardware you can run as many copies of the model as you want. Factories can build hardware far faster then they can train more people.
It's weird to see a denial of the industrial revolution on HN.

alex_young 12 days ago [-]

A bit uncharitable no?
I’m not denying that LLMs can be used to improve security research, suggesting that their use is wrong or anything like that.
Humans have used software to research security for a long time. AI driven SAST is clearly going to help improve productivity.

pixl97 12 days ago [-]

Quantity is a quality.
Humans burned stuff for a very long time now, it's when we started burning coal in mass industrially that the global environmental impacts started stacking up to the point of considerable damage.

i_think_so 12 days ago [-]

Ahem. Let's please don't go off into areas outside of the topic and end up repeating political talking points from people with agendas.
Coal, even a home coal fired boiler of the 1940s vintage, is just about as clean as solar, when compared to open cooking fires burning dung, which is the "most popular" method of harnessing combustion on Earth, measured per ton over per capita. Even going from wood to coal is a huge step up in pollution reduction compared to old school methods of burning randomly sourced trees. (Your rocket heater doesn't count. That wasn't even a twinkle in an inventor's eye when coal started to become popular.)
Source: did my senior P-chem work on smog. Then saw the theory made manifest (in a way that no amount of schoolwork could possibly replace) by looking at particulate build-up on a glacier with my own eyeballs. Pollution you can see, and hold in your hand will make this more clear than any amount of chart and graph reading about PM2.5 this and that.
Also: I hate that I had to self-censor my use of emdashes because I don't want my lived experiences to get flagged as chatbot slop. Grrr.

tracker1 12 days ago [-]

You still need people in the mix that understand the scope, scale and impact of the exploits/bugs found. Just letting agents go wild is how you get slop over time... You can probably get away without them to an extent, but I'd suggest that you're likely to increase the risk of errors and misbehavior in practice over time by not checking agent work.
Even checking human work is often a shortcoming of processes in practice.

aaron695 12 days ago [-]

[dead]

qsort 12 days ago [-]

A couple of alternative scenarios, although I'm not sure how much stock we should put in them:
- what if at a certain level of capability you're essentially bug-free? I'm somewhat skeptical that this could be the case in a strong sense, because even if you formally prove certain properties, security often crucially depends on the threat model (e.g. side channel attacks, constant-time etc,) but maybe it becomes less of a problem in practice?
- what if past a certain capability threshold weaker models can substitute for stronger ones if you're willing to burn tokens? To make an example with coding, GPT-3 couldn't code at all, so I'd rather have X tokens with say, GPT 5.4, than 100X tokens with GPT-3. But would I rather have X tokens with GPT 5.4 or 100X tokens with GPT 5.2? That's a bit murkier and I could see that you could have some kind of indifference curve.

Leomuck 12 days ago [-]

Honestly, if every software project ran an AI-based security check over their code, the software world would probably be more secure. Of course, there are lots of projects who don't need that, having skilled people behind it, but we've seen many popular software projects (even by big companies) who didn't care at all. So even a basic model would find issues.
Also, I find myself thinking more and more that the ability to pay for tokens is becoming crucial. And it's unfair. If you don't have money, you don't have access. Somehow, a worsening of class conflicts. If you know what I mean.

serial_dev 12 days ago [-]

Not only that, even if you would like to pay, the best model providers could decide any day that they want to save on cost, so they nerf the responses. Then you shipping on time is at the mercy of these companies.
If you spend months shipping slop, because “models will get better and tomorrow’s models can fix me today’s slop”, what happens when they not only do not get better, but actually get worse, and you are left with a bunch of slop you don’t understand and your problem solving muscles gotten weak?

byzantinegene 12 days ago [-]

IMO this is the only way model providers can survive in the long run, bank on their users overreliance on them resulting in diminishing capabilities. This gives them leverage to increase prices without any pushback

12 days ago [-]

Leomuck 12 days ago [-]

Good point indeed! I've been feeling Claude Code has gotten worse for a while now, read many articles on it, overall probably due to cost saving. But if you set your things up to depend on it, that becomes a huge issue.

nine_k 12 days ago [-]

> essentially bug-free
I would say that most software is going to have few easily exploitable bugs. Presence of such bugs will immediately cost more than having them discovered and fixed.
Other bugs, those that do not lead to easy pwning of a system, circumventing billing, etc, may linger as much as they currently do.

4qwUz 12 days ago [-]

While I fully agree with the headline I find it surprising that so many people implicitly claim familiarity with the aptly named "Mythos". Mythos is closed and currently has the status of an overhyped Anduril drone that failed contact with reality in Ukraine.
If anyone has access to the mythical Mythos we'll see the contact with reality.

RugnirViking 12 days ago [-]

my understanding is that employees of several of the largest companies in the world get access to it atm. Those employees are overrepresented in places like HN

WesolyKubeczek 12 days ago [-]

These employees may be as well under NDA, or their access may be predicated on them not sharing actual data (like Oracle and benchmarks). Anyway, you can’t verify any claims yourself, thus it might as well not exist.

redwood 12 days ago [-]

What seems to be getting lost in the noise on this topic is that security has always been about security in depth and mitigating controls, in other words applied paranoia. There are always threat vectors and we're seeing a change in the shape of those vectors with more rapidity than ever before which is certainly exhausting for everyone. But don't forget the fundamentals here

egormakarov 12 days ago [-]

> Different LLMs executions take different branches, but eventually the possible branches based on the code possible states are saturated
With LLMs even the halting problem is just the question of paying for pro subscription!

dtech 12 days ago [-]

The proof of halting being unsolvable usually uses a specific "adverserial" machine. In practice it's incredibly likely for the halt question to be answerable for any specific real life program.

nottorp 12 days ago [-]

Seriously. We need a BuSab for IT.
This continous rush is not healthy. npm updates, replies to articles that barely made HN 12 hours ago, anything like that. It's not healthy.
Slow down.

WesolyKubeczek 12 days ago [-]

Amtrak is slow and expensive, but the hype train is free!

gobdovan 12 days ago [-]

Now two popular articles argue about if cybersecurity can be seen as proof of work.
Interestingly enough, I was thinking of writing an article about how cybersecurity (both access models and operational assumptions) can be modeled as a proof (NOT proof of work) system. By that I mean there is an abstract model with a set of assumptions (policies, identities, invariants, configurations and implementation constraints) from which authorization decisions are derived.
A model is secure if no unauthorized action is derivable.
A system is correct if its implementation conforms to the model's assumptions.
A security model can be analyzed operationally by how likely its assumptions are to hold in practice.

niea_11 12 days ago [-]

I'm confused by the last part saying that if "weak" models (like gpt oss) find the openbsd bug they are just hallucinating. and also stronger models not finding it is because they dont hallucinate but are not strong enough.
AISLE demonstrated in the last few weeks that small (weak per the author) models can find the openBSD bug (when pointed at the code). And apparently did several runs with the same results. Was gpt oss hallucinating on all those runs?
And what separates a strong model from a weak one? Is qwen3.5 27b weak?
Don't trust who says that weak models can find the OpenBSD SACK bug. I tried it myself. What happens is that weak models hallucinate (sometimes causally hitting a real problem) that there is a lack of validation of the start of the window (which is in theory harmless because of the start < end validation) and the integer overflow problem without understanding why they, if put together, create an issue. It's just pattern matching of bug classes on code that looks may have a problem, totally lacking the true ability to understand the issue and write an exploit. Test it yourself, GPT 120B OSS is cheap and available.
BTW, this is why with this bug, the stronger the model you pick (but not enough to discover the true bug), the less likely it is it will claim there is a bug. Stronger models hallucinate less, so they can't see the problem in any side of the spectrum: the hallucination side of small models, and the real understanding side of Mythos

ramoz 12 days ago [-]

> So, cyber security of tomorrow will not be like proof of work in the sense of "more GPU wins"; instead, better models, and faster access to such models, will win
tomato, tomato

TZubiri 12 days ago [-]

So kind of like how you would get nowhere by buying more gpus if there's already ASICs in play.

4qskhaqj 12 days ago [-]

Mythos is just used to get new business contacts:
1) Create fear via the pro-American Axel Springer press (politico). Use UK/EU competition to make the EU jealous:
https://www.politico.eu/article/anthropic-hacking-technology...
2) Hype up the thing via clueless publications like the Guardian:
https://www.theguardian.com/technology/2026/apr/17/finance-l...
"As you would expect, the engagement I have had from UK CEOs in the last week has been significant."
3) Sell the damm thing that finds 20 vulns in an NNTP over CORBA written in INTERCAL app to EU and UK companies.
None of the people involved in "dealing with the threat" have the slightest clue. UK/EU always falls for the latest US hype and CEOs pay up.

csmantle 12 days ago [-]

> So, cyber security of tomorrow will not be like proof of work in the sense of "more GPU wins"; instead, better models, and faster access to such models, will win.
It's not proof of work, but proof of financial capacity.
The big companies are turning the access to high-quality token generators (through their service) into means of production. We're all going direct to Utopia, we're all going direct the other way.

tptacek 12 days ago [-]

There's no "proof" involved. That's the problem with the analogy. It's not about how much "financial capacity" you have. It's about how many bugs you find or fix. The bugs are there where the models help attackers/defenders or not.

kang 12 days ago [-]

The proof-of-work in ai(llm) 'can be' from the training side (not the inference side this blog explores) if a hashcash like 'proof' of model having being trained was defined. It should be possible to do so, since the very least measure of model having gotten smarter with some additional data, is that it will recognize/infer the said additional data correctly.

cowartc 11 days ago [-]

Hallucination vs real finding distinction is the core problem and doesn't get solved by a better model alone. It gets solved by what you do with the output. The verification layer is what makes the system production grade.

baxtr 12 days ago [-]

Interestingly enough: earlier today this discussion was trending: https://news.ycombinator.com/item?id=47769089 (Cybersecurity looks like proof of work now)

RugnirViking 12 days ago [-]

the article here is pretty clearly a response to the one you posted

onionisafruit 12 days ago [-]

It’s only clear if you know it exists, and now I know it exists thanks to gp.

EGreg 12 days ago [-]

This just proves that we should stop using old environments and operating systems for mission-critical work, and build a completely new environment from the ground up, that's secure by default. Instead of trying to fix leaky buckets.

HiramRoosevelt 8 days ago [-]

If you're looking for affordable and effective AI tools, this might be the place for you: https://account-bar.com/

hoppp 11 days ago [-]

About smaller models finding bugs ... The openBSD bug cost 20k to find, maybe a weaker model would do it for a 100k if rerun many times

andersmurphy 12 days ago [-]

> What happens is that weak models hallucinate (sometimes causally hitting a real problem)
So the bigger models hallucinate better causally hitting more real problems?

slopinthebag 12 days ago [-]

> Don't trust who says that weak models can find the OpenBSD SACK bug. I tried it myself.
This is exactly the argument AI skeptics make btw. Also you say you tried GPT 120B OSS, that's like me proclaiming LLM coding doesn't work because I tried putting gpt 3.5 in Claude Code. Try it with GLM 5, Qwen, etc. Or improve your harness :)

douglaswlance 12 days ago [-]

you get better models with more compute.
its not just PoW at inference. It's PoW of inference + training.

thesuperevil 12 days ago [-]

That’s interesting

12 days ago [-]

jeremie_strand 12 days ago [-]

[dead]

kvikuz 12 days ago [-]

[dead]

riteshkew1001 12 days ago [-]

[flagged]

cremer 12 days ago [-]

[dead]

Rendered at 21:56:38 GMT+0000 (UTC) with Wasmer Edge.