Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Telum II at Hot Chips 2024: Mainframe with a Unique Caching Strategy (chipsandcheese.com)

134 points by rbanffy 1 days ago | 66 comments

pipallweek 37 minutes ago [-]

What's particularly impressive about Telum II isn’t just the cache size or even the architecture—it’s the deliberate trade-off IBM makes for ultra-low latency L2, almost as a replacement for traditional L3. That decision makes a lot of sense in a mainframe context where deterministic performance, low tail latency, and tight SLA adherence matter more than broad throughput per watt.

It also feels like a return to form: IBM has always optimized for workload-specific performance over general-purpose benchmarks. While x86 designs scale horizontally and are forced to generalize across consumer and datacenter workloads, IBM is still building silicon around enterprise transaction processing. In a way, this lets them explore architectural territory others wouldn’t touch—like banking on huge, fast L2 caches and deferring cross-core coordination to clever interconnect and software layers.

jfindley 1 days ago [-]

It's a shame the article chose to compare solely against AMD CPUs, because AMD and Intel have very different L3 architectures. AMD CPUs have their cores oranised into groups, called a CCX, each of which have their own small L3 cache. For example the Turin-based 9755 has 16 CCXs each with 32MB of L3 cache. Far less cache per core than the mainframe CPU being described. In contrast to this, Intel uses an approach that's a little closer to the Telum II CPU being described - a Granite Rapids AP chip such as 6960P has 432 MB of L3 cache shared between 72 physical cores, each with its own 2MB L2 cache. This is still considerably less cache, but it's not quite as stark a difference as the picture painted by the article.

This doesn't really detract from the overall point - stacking a huge per-core L2 cache and using cross-chip reads to emulate L3 with clever saturation metrics and management is very different to what any x86 CPU I'm aware of has ever done, and I wouldn't be surprised if it works extremely well in practice. It's just that it'd have made a stronger article IMO if it had instead compared dedicated L2 + shared L2 (IBM) against dedicated L2 + shared L3 (intel), instead of dedicated L2 + sharded L3 (amd).

rayiner 1 days ago [-]

Granite Rapids is also a better example because it's an enterprise processor with a huge monolithic die (almost 600 square mm).

A key distinction, however, is latency. I don't know about Granite Rapids, but sources show that Sapphire Rapids had an L3 latency around 33 ns: https://www.tomshardware.com/news/5th-gen-emerald-rapids-cpu.... According to the article, the L2 latency in the Tellum II chips is just 3.8 ns (about 21 clock cycles at 5.5 GHz). Sapphire Rapids has an L2 latency of about 16 clock cycles.

IBM's cache architecture enables a different trade-off in terms of balancing the L2 versus L3. In Intel's architecture, the shared L3 is inclusive, so it has to be at least as big as L2 (and preferably, a lot bigger). That weighs in favor of making L2 smaller, so most of your on-chip cache is actually L3. But L3 is always going to have higher latency. IBM's design improves single-thread performance by allowing most of the on-chip cache to be lower-latency L2.

1 days ago [-]

theandrewbailey 21 hours ago [-]

> Telum II and prior IBM mainframe chips handle server tasks like financial transactions, but curiously seem to prioritize single threaded performance.

IBM was doing SMT (aka Hyperthreading) for decades, long before x86 did. I can't get a number for Telum II, but the original Telum implemented 16-way SMT per core[0], so your 8 core Telum can do 128 threads. I expect similar from Telum II.

[0] https://www.ibm.com/linuxone/telum

elzbardico 1 days ago [-]

It must be fun being a hardware engineer for IBM mainframes: Cost constraints for your designs can be mostly be left aside, as there's no competition, and your still existing customers have been domesticated to pay you top dollar every upgrade cycle, and frankly, they don't care.

Cycle times are long enough so you can thoroughly refine your design.

Marketing pressures are probably extremely well thought, as anyone working on Mainframe marketing is probably either an ex-engineer or almost an engineer by osmosis.

And the product is different enough from anything else, that you can try novel ideas, but not different enough that your design skills are useless elsewhere, and you can't leverage other's advancement idea.

bgnn 1 days ago [-]

They fund a lot of R&D in house and let people try crazy new ideas. Too bad it's just for a niche product.

They have a famous networking (optical and wireline) group doing a lot of state-of-the-art research, and they deploy these in their mainframe products.

There's no other company like it. They are, in a sense, exact opposite of Apple where all HW engineers are pushed for impossible deadlines and solutions that save the day. Most in house developed IP is then not competitive enough and at the end doesn't end up in production (like their generations of 5G modems and the IPs in these like data converters etc).

erk__ 1 days ago [-]

There is also no other place that will just implement conversions between utf formats, compression in hardware or various hashed and other crypto in hardware like they do.

Philpax 1 days ago [-]

I'm pretty sure that at least a few of those are implemented in modern x86 / ARM CPUs? As an immediate example: https://en.wikipedia.org/wiki/AES_instruction_set

oasisaimlessly 1 days ago [-]

Another, introduced around the same time (SHA instruction set): https://en.wikipedia.org/wiki/SHA_instruction_set

erk__ 15 hours ago [-]

Yeah there are some of those, but IBM has stuff like Ed25519 signatures, x86 at least seems to be a bit on the back foot when it comes to implementing things like that. They will probably rather wait until AES is popular before they add it.

pezezin 19 hours ago [-]

Modern consoles provide hardware decompression to transfer compressed assets from the NVME directly to the GPU. Similar functionality is coming to PCs in the form of DirectStorage and Vulkan extensions.

erk__ 16 hours ago [-]

Yeah the newest PlayStation have a hardware implementation of Oodle which is really cool. Z/Arch have a inflate/deflate implementation which makes it a bit more general than the texture decompression.

AtlasBarfed 7 hours ago [-]

Hardware decompression can increase overall processing performance over uncompressed data because I/o is slow.

Heck even without hardware decompression it is often faster.

Considering mainframe sweet spot is I/o processing, not surprising.

sillywalk 24 hours ago [-]

Nitpick: the SPARC M7/M7 have hardware compression.

erk__ 15 hours ago [-]

That is very cool I had no idea, seems like it is also in the newer T8/M8 according to this whitepaper: https://www.oracle.com/dk/a/ocom/docs/sparc-t8-m8-server-arc...

rbanffy 1 days ago [-]

> Cost constraints for your designs can be mostly be left aside, as there's no competition

I don’t think they neglect the migration of workloads to cloud platforms. Mainframes can only cost so much before it’s cheaper to migrate the workloads to other platforms and rewrite some SLAs. They did a ton of work on this generation to keep power envelope in the same ballpark of the previous generation, because that’s what their clients were concerned about.

jonathaneunice 1 days ago [-]

Virtual L3 and L4 swinging gigabytes around to keep data at the hot end of the memory-storage hierarchy even post L2 or L3 eviction? Impressive! Exactly the kind of sophisticated optimizations you should build when you have billions of transistors at your disposal. Les Bélády's spirit smiles on.

zozbot234 1 days ago [-]

Virtual L3 and L4 looks like a bad deal today since SRAM cell scaling has stalled quite badly in recent fabrication nodes. It's quite possible that future chip designs will want to use eDRAM at least for L4 cache if not perhaps also L3, and have smaller low-level caches where "sharing" will not be as useful.

dragontamer 1 days ago [-]

Does it?

Recycling SRAM when it becomes more precious seems like a more optimal strategy rather than having the precious SRAM sit idle on otherwise sleeping cores.

adgjlsfhk1 23 hours ago [-]

sram scaling appears to have recovered in the new nodes with gaafet and backside power delivery

exabrial 1 days ago [-]

What languages are people still writing mainframe code in? In 2011 working for a prescription rx processor, COBOL was still the name of the game.

rbanffy 1 days ago [-]

There's also lots of Java as well, and IBM is making a big effort on porting existing Unix utilities to z/OS (which is a certified UNIX). With Linux, the choices are the same as with other hardware platforms. I assume you'll find lots of Java and Python running on LinuxONE machines.

Running Linux, from a user's perspective, it feels just like a normal server with a fast CPU and extremely fast IO.

jandrewrogers 1 days ago [-]

How fast is “extremely fast”? Normal x86 Linux servers drive multiple 100s of GB/s of I/O these days. Storage is only slow because cloud.

rbanffy 1 days ago [-]

I never benchmarked it, but the latency feels very low. Mainframes don't have any local storage, so anything it's using will be a box on a separate rack (or spanning multiple racks).

jiggawatts 1 days ago [-]

> extremely fast IO.

I wonder how big a competitive edge that will remain in an era where ordinary cloud VMs can do 10 GB/s to zone-redundant remote storage.

Cthulhu_ 1 days ago [-]

GB/s is one metric, but IOPS and latency are others that I'm assuming are Very Important for the applications that mainframes are being used for today.

imtringued 1 days ago [-]

IOPS is the most meaningless metric there is. It's just a crappy way of saying bandwidth with an implied sector size. 99% of software developers do not use any form of async file IO and therefore couldn't care less. The async file IO support in postgres has been released a month ago. It's that niche of a thing that even extremely mature software that could heavily benefit from it hasn't bothered implementing it until last month.

jiggawatts 1 days ago [-]

Microsoft SQL Server has been using async scatter/gather IO APIs for decades. Most database engines I've worked with do so.

Postgres is weirdly popular despite being way, way behind on foundational technology adoption.

rbanffy 1 days ago [-]

> Microsoft SQL Server has been using async scatter/gather IO APIs for decades. Most database engines I've worked with do so.

Windows NT has asynchronous IO since its VAX days ;-)

> Postgres is weirdly popular despite being way, way behind on foundational technology adoption.

It's good enough, free, and performs well.

jiggawatts 22 hours ago [-]

I constantly hear about VACUUM problems and write amplification causing performance issues bad enough that huge users of it were forced to switch to MySQL instead.

pgaddict 20 hours ago [-]

I've been involved in a couple of those cases, where a large company ran into an issue, and chose to solve it by migrating to something else. And while the issues certainly exist (and are being addressed), the technical reasons often turned out to be a rather tiny part of the story. And in the end it was really about internal politics and incentives.

In several such cases, the company was repeatedly warned about how they implemented some functionalities, and that it will cause severe issues with bloat/vacuuming, etc. Along with suggestions how to modify the application to not hit those issues. Their 10x engineers chose to completely ignore that advice, because in their minds they constructed an "ideal database" and concluded that anything that behaves differently is "wrong" and it's not their application that should change. Add a dose of politics where a new CTO wants to rebuild everything from scratch, engineers with NIH syndrome, etc. It's about incentives - if you migrate to a new system, you can write flashy blog posts how the new system is great and saved everything.

You can always argue the original system would be worse, because everyone saw it had issues - you just leave out the details about choosing not to address the issues. The engineering team is unlikely to argue against that, because that'd be against their interests too.

I'm absolutely not claiming the problems do not exist. They certainly do. Nor am I claiming Postgres is the ideal database for every possible workload. It certainly is not. But the worst examples that I've seen were due to deliberate choices, driven by politics. But that's not described anywhere. In public everyone pretends it's just about the tech.

rbanffy 10 hours ago [-]

Politics is an unavoidable aspect of larger groups, but it gets a lot worse when coupled by wrong incentives that reward heroic disaster mitigation over active disaster avoidance.

When you design a system around a database, it pays off to design your ways of mitigating performance issues you might face in the future. Often, a simple document explaining directions to evolve the system into based on the perceived cause. You might want to add extra read replicas, introduce degraded modes for when writes aren't available, moving some functions to their own databases, sharing big tables, and so on. With a somewhat clear roadmap, your successors don't need to panic when the next crisis appears.

For extra points, leave recordings dressed as Hari Seldon.

22 hours ago [-]

FuriouslyAdrift 1 days ago [-]

Latency is much more important than thoughput...

inkyoto 1 days ago [-]

Guaranteed sustained write throughput is a distinguished feature of the mainframe storage.

Whilst cloud platforms are the new mainframe (so to speak), and they have all made great strides in improving the SLA guarantees, storage is still accessed over the network (plus extra moving parts – coordination, consistency etc). They will get there, though.

RetroTechie 1 days ago [-]

On-site.

Speed is not the only reason why some org/business would have Big Iron in their closet.

bob1029 1 days ago [-]

You can do a lot of damage with some stored procedures. SQL/DB2 capabilities often go overlooked in favor of virtualizing a bunch of Java apps that accomplish effectively the same thing with 100x the resource use.

exabrial 1 days ago [-]

Hah, anecdote incoming, but 100x times a resource usage is probably accurate. Given, 100x of a human hair is still just a minuscule grain of sand, but that’s the scale margins Mainframe operators work in.

As one grey beard said it to me: Java is loosely typed and dynamic compared to colbol/db2/pl-sql. He was particularly annoyed that the smallest numerical type a ‘byte’ in Java was quote: “A waste of bits” and that Java was full of “useless bounds checking” both of which were causing “performance regressions”.

The way mainframe programs are written is: the entire thing is statically typed.

PaulHoule 1 days ago [-]

I knew mainframe programmers were writing a lot of assembly in the 1980s and they probably still are.

BugheadTorpeda6 1 days ago [-]

It's one of the last platforms that people still write a lot of hand written assembly for. Part of this is down to the assembler being very ergonomic and there being a very capable macro system available with macros provided for most system services. Part of it is due to the OS predating C becoming popular, so there are no headers files (just assembler macros) for many older system services (you can thunk them into C compatible calls, but it's sometimes more of a headache than just writing in assembler). Definitely C is becoming more popular lately on the platform, but you will still find a lot of people programming in assembler for a living in 2025. Its probably the only subfield of programming that uses handwritten assembly in that way outside of embedded systems.

rbanffy 1 days ago [-]

> the entire thing is statically typed.

Not always, but they do have excellent performance analysis and will do what they can to avoid slow code in the hottest paths.

thechao 1 days ago [-]

When I was being taught assembly at Intel, one of the graybeards told me that the greatest waste of an integer was to use it for a "bare" add, when it was a perfectly acceptable 64-wide vector AND. To belabor the the point: he used ADD for the "unusual set of XORs, ANDs, and other funky operations it provided across lanes". Odd dude.

dragontamer 1 days ago [-]

Reverse engineering the mindset....

In the 90s, a cryptography paper was published that more quickly brute forced DES (standard encryption algorithm back then) using SIMD across 64-bit registers on a DEC Alpha.

There is also the 80s Connection Machine which was a 1-bit SIMD x 4096-lane supercomputer.

---------------

It sounds like this guy read a few 80s or 90s papers and then got stuck in that unusual style of programming. But there were famous programs back then that worked off of 1-bit SIMD x 64 lanes or x4096 lanes.

By the 00s, computers have already moved on to new patterns (and this obscure methodology was never mainstream). Still, I can imagine that if a student read a specific set of papers in those decades... This kind of mindset would get stuck.

formerly_proven 22 hours ago [-]

That's bit-slicing (not the hardware technique).

uticus 1 days ago [-]

> ...it was a perfectly acceptable 64-wide vector AND.

sounds like "don't try to out-optimize the compiler."

thechao 1 days ago [-]

In 2025, for sure. In 2009 ... maybe? Of course, he had become set in his ways in the 80s and 90s.

BugheadTorpeda6 1 days ago [-]

For applications or middlewares and systems and utilities?

For applications, COBOL is king, closely followed by Java for stuff that needs web interfaces. For middleware and systems and utilities etc, assembly, C, C++, REXX, Shell, and probably there is still some PL/X going on too but I'm not sure. You'd have to ask somebody working on the products (like Db2) that famously used PL/X. I'm pretty sure a public compiler was never released for PL/X so only IBM and possibly Broadcom have access to use it.

COBOL is best thought of as a domain specific language. It's great at what it does, but the use cases are limited you would be crazy to write an OS in it.

pjmlp 1 days ago [-]

RPG, COBOL, PL/I, NEWP are the most used ones. Unisys also has their own Pascal dialect.

Other than that, there are Java, C, C++ implementations for mainframes, for a while IBM even had a JVM implementation for IBM i (AS/400), that would translate JVM bytecodes into IBM i ones.

Additionally all of them have POSIX environments, think WSL like but for mainframes, here anything that goes into AIX, or a few selected enterprise distros like Red-Hat and SuSE.

BugheadTorpeda6 1 days ago [-]

It sounds like you are referring to AS/400 and successors (common mistake, no biggie) rather than the mainframes being referred to here that are successors of System 360 and use the Telum chips (as far as I am aware they have never been based on POWER, like IBM i and AIX and the rest of the AS/400 heritage). RPG was never a big thing on mainframes for instance. I've never come across it in about 10 years of working on them professionally. Same with NEWP, I've never heard of it. And Java is pretty important on the platform these days and not an attempt from the past. It's been pushed pretty hard for at least 20 years and is kept well up to date with newer Java standards.

Additionally, the Unix on z/OS is not like WSL. There is no virtualization. The POSIX APIs are implemented as privileged system services with Program Calls (kind of like supervisor calls/system calls). It's more akin to a "flavor" kinda like old school Windows and OS/2 than the modern WSL. You can interface with the system in the old school MVS flavor and use those APIs, or use the POSIX APIs, and they are meant to work together (for instance, the TCPIP stack and web servers on the platform are implemented with the POSIX APIs, for obvious compatibility and porting reasons).

Of course, you can run Linux on mainframes and that is big too, but usually when people refer to mainframe Unix they are talking about how z/OS is technically a Unix, which I don't think it would count in the same way if it was just running a Unix environment under a virtualization layer. Windows can do that and it's not a Unix.

pjmlp 13 hours ago [-]

The question was,

> What languages are people still writing mainframe code in?

so I naturally answered with the ones I know about.

> RPG was never a big thing on mainframes for instance.

It certainly was in Portugal, back in its heyday.

> Same with NEWP, I've never heard of it.

Systems programming language for Burroughs B5000, evolution from ESPOL its original one, nowadays sold a Unisys MCP.

> And Java is pretty important on the platform these days and not an attempt from the past. It's been pushed pretty hard for at least 20 years and is kept well up to date with newer Java standards.

The attempt from the past was converting JVM bytecodes to TIMI on the fly, this approach was dropped in favour of the regular JIT.

For the rest regarding "like WSL", I was trying to simplify, not giving an conference talk on UNIX support across micro and mainframes, and its evolution.

sillywalk 1 days ago [-]

Nitpick:

Almost 20 years ago IBM had the eclipz project to share technology between its POWER (System i and System P) and Mainframe (System Z) servers. I'm not sure if counts as "based on", but

"the z10 processor was co-developed with and shares many design traits with the POWER6 processor, such as fabrication technology, logic design, execution unit, floating-point units, bus technology (GX bus) and pipeline design style."[0]

The chips were otherwise quite different, and obviously don't share the same ISA. I also don't know if IBM has kept this kind of sharing between POWER & Z.

[0] https://en.wikipedia.org/wiki/IBM_z10

rbanffy 1 days ago [-]

They must share some similarities - it's IBM and both POWER and Z have access to the same patents and R&D. Apart from that, they are very different chips, for very different markets.

Also, I'm sure there are many little POWER-like cores inside a z17 doing things like pushing data around. A major disappointment is that the hardware management elements are x86 machines, probably the only x86 machines IBM still sells.

fneddy 1 days ago [-]

There is a thing called LinuxONE. That’s Linux for IBM mainframes. So basically you can run anything that runs on Linux (and can be compiled on s390x) on a mainframe.

fneddy 1 days ago [-]

And if you are really interested. There is the LinuxONE community cloud by the marist college. If you do open source and want to add support for mainframe / s390x you can get a free tir from them.

specialist 1 days ago [-]

Most impressive.

I would enjoy an ELI5 for the market differences between commodity chips and these mainframe grade CPUs. Stuff like design, process, and supply chain, anything of interest to a general (nerd) audience.

IBM sells 100s of Z mainframes per year, right? Each can have a bunch of CPUs, right? So Samsung is producing 1,000s of Telums per year? That seems incredible.

Given such low volumes, that's a lot more verification and validation, right?

Foundaries have to keep running to be viable, right? So does Samsung bang out all the Telums for a year in one burst, then switch to something else? Or do they keep producing a steady trickle?

Not that this info would change my daily work or life in any way. I'm just curious.

TIA.

detaro 1 days ago [-]

It's something they'll run a batch for occasionally, but thats normal. Fabs are not like one long conveyor belt where a wafer goes in at the front, passes through a long sequence of machines and falls out finished at the end. They work in batches, and machines need reconfiguring for the next task all the time (if a chip needs 20 layers, they don't have every machine 20 times), so mixing different products and having products is normal. Low-volume products are going to be more expensive of course due to per-batch setup tasks being spread over fewer wafers.

In general scheduling of machine time and transportation of FOUPs (transport boxes for a number of wafers, the basic unit of processing) is a big topic in the industry, optimizing machine usage while keeping the number of unfinished wafers "in flight" and the overall cycle time low. It takes weeks for a wafer to flow through the fab.

bob1029 1 days ago [-]

It is non-trivial to swap between product designs in a fab. It can take many lots before you have statistical process controls dialed in to the point where yields begin to pick up. Prior performance is not indicative of future performance.

BugheadTorpeda6 1 days ago [-]

It's probably more on the order of a thousand or so mainframe systems delivered per year, based on average lifespan of one of these systems and the ~10,000 or so existing mainframe customers.

Then you have to consider that each CPC drawer for a mainframe has 4 sockets. And there are multiple CPC drawers per maxed out system (I believe up to 4, so 16 sockets per mainframe). And some larger customers will have many mainframes setup in clusters for things like parallel sysplex and disaster recovery.

So probably more on the order of ~10's of thousands of these chips getting sold per year.

So definitely not high volume in CPU manufacturing terms. But it's not miniscule.

Merrill 20 hours ago [-]

A good configuration is a pair of mainframes running parallel sysplex in two data centers on separate power and communications grids, with another pair more than 200 miles away for disaster recovery, and maybe a 5th in a single data center for more redundant data backup. Then you are set to do overnight runs to calculate where a few trillion dollars went that day.

iamwpj 1 days ago [-]

It's probably 100k in one run and then they are stored for future use.

rbanffy 1 days ago [-]

And since they can sell the machine fully configured, sales of single CPUs or drawers must not be a big thing. They will keep stock for replacements, but I agree we won't see them doing multiple batches often. By now they must be well into the development of the z18 and z19 processors. This year Hot Chips I'd expect them showing the next POWER and, in 2026, the next Z, hitting GA in 2027 or 2028.

belter 1 days ago [-]

The mainframe in 2025 is absolutely at the edge of technology. For some algorithms in ML where massive GPU parallelism is not a benefit, could even get a strong comeback.

I got so jealous of some colleagues, once even considered getting into mainframe work. CPU at 5.5 Ghz continuously, (not peak...) massive caches, really, really non stop...

Look at this tech porn: "IBM z17 Technical Introduction" - https://www.redbooks.ibm.com/redbooks/pdfs/sg248580.pdf

xattt 21 hours ago [-]

Imagine a Beowulf cluster of those!

LargoLasskhyfv 17 hours ago [-]

Why? They have https://en.wikipedia.org/wiki/IBM_Parallel_Sysplex

xattt 11 hours ago [-]

The joke is the chip is meant for datacentre workloads so parallelization is a given.

LargoLasskhyfv 3 hours ago [-]

Sure. It's just that they can be clustered too. So they can be the overwulf.

Whoo hooo hoooo howl...…

bell-cot 1 days ago [-]

Interesting to compare this to ZFS's ARC / MFUvsMRU / Ghost / L2ARC / etc. strategy for (disk) caching. IIR, those were mostly IBM-developed technologies.

msbah 1 days ago [-]

[flagged]

Rendered at 20:15:53 GMT+0000 (UTC) with Wasmer Edge.