When I was in Uber back in 2015, my org was trying to convert zip-code-based geo partitioning with a hexagon-based scheme. Instead of partitioning a city into on average tens of zip codes, we may partition the city into potentially hundreds of thousands of hexagons and dynamically create areas. The first launch was in Phoenix, and the team who was responsible for the launch stayed up all night for days because they could barely scale our demand-pricing systems. And then the global launch of the feature was delayed first by days, then by weeks, and then by months.
It turned out Uber engineers just loved Redis. Having a need to distribute your work? Throw that to Redis. I remember debating with some infra engineers why we couldn't throw in more redis/memcached nodes to scale our telemetry system, but I digressed. So, the price service we built was based on Redis. The service fanned out millions of requests per second to redis clusters to get information about individual hexagons of a given city, and then computed dynamic areas. We would need dozens of servers just to compute for a single city. I forgot the exact number, but let's say it was 40 servers per an average-sized city. Now multiply that by the 200+ cities we had. It was just prohibitively expensive, let alone that there couldn't other scalability bottlenecks for managing such scale.
The solution was actually pretty simple. I took a look at the algorithms we used, and it was really just that we needed to compute multiple overlapping shapes. So, I wrote an algorithm that used work-stealing to compute the shapes in parallel per city on a single machine, and used Elasticsearch to retrieve hexagons by a number of attributes -- it was actually a perfect use case for a search engine because the retrieval requires boolean queries of multiple attributes. The rationale was pretty simple too: we needed to compute repetitively on the same set of data, so we should retrieve the data only once for multiple computations. The algorithm was of merely dozens of lines, and was implemented and deployed to production over the weekend by this amazing engineer Isaac, who happens to be the author of the library H3. As a result, we were able to compute dynamic areas for 40 cities, give or take, on a single machine, and the launch was unblocked.
ckrapu 16 hours ago [-]
I love H3. Isaac and Uber did a real service to the geospatial community with that one.
jiggawatts 11 hours ago [-]
To me H3 looked over-engineered and unnecessarily complex. Hexagons don't tile nicely at multiple resolutions, for one! Just overcoming that is decidedly non-trivial.
Implementing Google's S2 is simpler, but it has the same overall benefits as H3 such as a hierarchical data structure.
g9yuayon 9 hours ago [-]
H3's algorithms involve some intricate maths, but the library itself is conceptually simple. Check this page out for some really fun and neat ideas: https://www.redblobgames.com/grids/hexagons/.
Uber internally had extensive research on what kind of grid system to use. In fact, we started with S2 and geo-hash, but H3 is superior. Long story short, hexagons are like discretized circles, and therefore offer more symmetry than S2 cells[1]. Consequently, hexagons offer more uniform shapes when we compose hierarchical structures. Besides, H3 cells have more consistent sizes in different latitudes, which is very important for uber to compute supply and demand of cars.
[1] One of the complications is that H3 has to have pentagons to tile the entire world, just like a soccer ball. We can easily see why by Euler's characteristic formula.
For anyone doing geo queries it's a powerful tool.
muggermuch 17 hours ago [-]
Cool anecdote, thanks for sharing!
tombert 20 hours ago [-]
I have gotten in arguments with people who over-deploy Redis. Redis is cool, I don't dislike it or anything, but a lot of the time when people use it, it actually slows things down.
Using it, you're introducing network latency and serialization overhead. Sometimes that's worth it, especially if your database is falling over, but a lot of the time people use it and it just makes everything more complex and worse.
If you need to share cached data across processes or nodes, sometimes you have to use it, but a lot of the stuff I work with is partitioned anyway. If your data is already partitioned, you know what works well a lot of the time? A boring, regular hashmap.
Pretty much every language has some thread-safe hashmap in there, and a lot of them have pretty decent libraries to handle invalidation and expiration if you need those. In Java, for example, you have ConcurrentHashMap for simple stuff, and Guava Caches or Caffeine Caches for more advanced stuff.
Even the slowest [1] local caching implementation will almost certainly be faster than anything that hits the network; in my own testing [2] Caffeine caches have sub-microsecond `put` times, and you don't pay any serialization or deserialization cost. I don't think you're likely to get much better than maybe sub-millisecond times with Redis, even in the same data center, not to mention if you're caching locally that's one less service that you have to babysit.
Again, I don't hate Redis, there are absolutely cases where it's a good fit, I just think it's overused.
[1] Realistic I mean, obviously any of use could artificially construct something that is slow as we want.
My trick is saying no to redis full stop. Every project where it was used as a cache only it developed retention ans backup requirements and every project where it was a key value store someone built a relational database on top of it.
There’s nothing worse than when someone does the latter. I had to write a tool to remove deletes from the AOF log because someone fucked up ordering of operations big time trying to pretend they had proper transactions.
ysavir 18 hours ago [-]
I love Redis, but my rule is that we should be able to flush the redis data at any time without any problems. Any code that makes that unfeasible is rejected.
vrosas 15 hours ago [-]
I've never done it IRL but I've always wanted to delete my company's redis instances and see what happens, chaos monkey style. If you're service breaks because it expected the cache to be there or your database immediately goes down because of too many requests, you're going to have a bad time _eventually_.
Salgat 7 hours ago [-]
Redis does support persistence, so there are valid use cases where you expect the data to be around.
edoceo 13 hours ago [-]
This is something one could/should simulate in test.
zombiwoof 11 hours ago [-]
Yes, this design rule it’s very useful
Delomomonl 17 hours ago [-]
I don't get it
I'm using redis only for temp state data like a session (when I can't use a jwt).
Or when I have to scale and need a warmed up cache
Is that bad now?
I'm also wondering right now why there is no local cache with p2p self discovery and sync. Should be easier than deploying an extra piece of software.
lucb1e 10 hours ago [-]
If sessions die when your system reboots, that means you can't reboot the system (update the service) without breaking whatever any users were currently doing on your site or in your software. That does sound bad to me and like a bad fit for Redis the memory cache. (I know it can do persistence optionally but that's what the person above you was complaining about: this is not what it's good at)
Why not use a regular database for this (can be as simple as an sqlite file, depending on your needs), or the default thingy that comes with your framework or programming language? This is built into everything I've ever used, no need to reinvent session storage or overengineer the situation with jwt or some other distributed cryptographic system and key management
physicsguy 8 hours ago [-]
> This is built into everything I've ever used
Ah but in trendy microservices world, it isn’t in many micro frameworks, you have to reinvent it
Delomomonl 3 hours ago [-]
[dead]
jiggawatts 11 hours ago [-]
> I'm also wondering right now why there is no local cache with p2p self discovery and sync. Should be easier than deploying an extra piece of software.
The whole design space for this type of API is weirdly under-explored, but there are some well-supported mainstream solutions out there.
Fundamentally, Redis ought to be a NuGet library, a Rust crate, or something like it. It's just a distributed hash table, putting it onto its own servers is a bit bizarre if the only need is caching.
Microsoft's Service Fabric platform and the Orleans library both implement distributed hash tables as fundamental building blocks. Both can trivially be used "just" as a cache to replace Redis, and both support a relatively rich set of features if you need more advanced capabilities.
Of course, there's Scala's Akka and the Akka.NET port also.
eitland 7 hours ago [-]
I wonder if you think about (things like) Hazelcast?
It is JVM based "shared cache" so can be used to transparently share results of expensive queries - but also to share sessions. It mostly just works but the free version have some issues when one upgrade data models.
I know half the people here probably loathe JVM but once one is aware of one implementation I guess it should be possible to find similar things for .Net and maybe also go and Python.
neonsunset 9 hours ago [-]
I think you could make Garnet work as a library. Or, at the very least, use FASTER/Tsavorite KV for that instead.
jiggawatts 5 hours ago [-]
Garnet, like Redis, is explicitly designed to be remotely accessed over the network, which is frankly disappointing and derivative.
Microsoft could do better than that!
For example, Azure App Service could use an out-of-process shared cache feature so that web apps could have local low-latency caches that survive app restarts.
fabian2k 19 hours ago [-]
I prefer caching in memory, but a major limitation once you have more than one process is invalidation. It's really only easy to stuff you can cache and just expire on time, not if you need to invalidate it. At that point you need to communicate between your processes (or all of them need to listen to the DB for events).
tombert 19 hours ago [-]
Yeah, if you need to do things across processes then something like Redis or memcached might be necessary.
The thing that bothers me is people adding it in places that don't make sense; I mentioned in a sibling thread that the I've seen people use it as a glorified global variable in stuff like Kafka streaming. Kafka's stuff is already partitioned, you likely don't gain anything from Redis compared to just keeping a local map, and at that point you can just use a Guava Cache and let it handle invalidation in-process.
koolba 18 hours ago [-]
Not just across concurrent processes, but also serial ones. Externalizing a cache into something like Redis lets you bounce your process with no reload time. You can get around it for some things like web sessions with a signed cookie, but that opens up expiration and invalidation issue.
But that doesn’t work for caching non trivial calculations or intermediate state. There’s a sweet spot for transitory persistence.
blazing234 13 hours ago [-]
I think the crazy thing is people think redis is the only thing that catches in memory.
You could throw a bunch of your production data in SSAS tabular and there you go you have an in memory cache. I've actually deployed that as a solution and the speed is crazy.
Elucalidavah 19 hours ago [-]
> need to listen to the DB for events
You could store the key->version separately, and read the said version.
If the cached version is lower, it's a cache miss.
Of course, evicting something from cache (due to memory constraints) is a bit harder (or less efficient) in such setup.
Seattle3503 11 hours ago [-]
I wonder if there are language neutral alternatives to Infinispan.
evil-olive 18 hours ago [-]
an antipattern I've observed when giving system design interviews is that a lot of people, when faced with a performance problem, will throw out "we should add a caching layer" as their first instinct, without considering whether it's really appropriate or not.
for example, if the problem we're talking about is related to slow _writes_, not slow reads, the typical usage of a cache isn't going to help you at all. implementing write-through caching is certainly possible, but has additional pitfalls related to things like transactional integrity between your cache and your authoritative data store.
lucb1e 10 hours ago [-]
> throw out "we should add a caching layer" as their first instinct, without considering whether it's really appropriate or not
Could be worse: you could have met me! I used to laugh at caching and thought that if your website is so slow that you need a caching layer (Wordpress comes to mind), you're just doing it wrong: perhaps you're missing indexes on your database or you simply can't code properly and made it more complex than necessary (I was young, once). Most of my projects are PHP scripts invoked by Apache, so they have no state and compute everything fresh. This is fine (think <30ms typical page generation time) for 95% of the types of things I make, but in more recent years I had two projects where I really struggled with that non-pragmatic mentality and spent long hours experimenting with different writing strategies (so data wouldn't change as often and MariaDB's built-in optimizations better), indexes on low-cardinality columns, indexes on combined columns in specific orders, documenting with each query which index it requires and maps to, optimizing the query itself of course, in one experiment writing my own on-disk index file to search through some gigabytes of data much faster than the database seemed to be able to do for geospatial information, upgraded the physical hardware from HDD to SSD...
Long story short, I now run Redis and the website is no longer primarily bound by computation power but, instead, roughly equally by bandwidth
I'm still very wary of introducing Redis to projects lest I doom them: it'll inevitably outgrow RAM if I indiscriminately stick things in there, which means turning them off (so far, nearly no links or tools on my website ever turned 404 because they're all on a "keep it simple" WAMP/LAMP stack that can do its thing for many years, perhaps search-and-replacing something like mysql_query() with mysqli->query() every five years but that's about the extent of the maintenance)
So anyway I think we're in agreement about "apply where appropriate" but figured I'd share the counter-example of how one can also be counterproductive in the other direction and that there is something to be said for the pragmatic people that consider/try a cache, which often does help even if there's often a different underlying problem and my perfectionism wouldn't like it
GaryNumanVevo 17 hours ago [-]
It's a super common "new to SRE" behavior to overindex on caching as a silver bullet, especially because literally every DB has mechanisms to scale reads fairly easily. In my experience, redis is often needed when you have a DB team that doesn't want to put in the effort to scale reads
sgarland 16 hours ago [-]
Or when the devs don’t want to rewrite their schemata in a way that would massively reduce I/O requirements.
Then when you lose a cache node, the DB gets slammed and falls over, because when the DB team implemented service-based rate-limiting, the teams cried that they were violating their SLOs so the rate limits were bumped waaaay up.
re-thc 11 hours ago [-]
> an antipattern I've observed when giving system design interviews is that
It's an interview though. Most people just watch youtube videos and "copy and paste" the answer.
In a way it's the format of the interview that's the problem. Similar to leet code style interviews a lot of the times we're not checking for what we need.
Too 7 hours ago [-]
Disagree on this one. In an interview there is no "the answer", it's a dialogue. I've interviewed a lot of people, often using performance related questions, and trust me, there are lots of candidates whose only answer to those is "add a cache", even after multiple follow-up questions or hints like "is there anything else that can be done?", "try thinking outside the box", "what can be done with the database itself", etc. Only a novice interviewer will be fooled by the first answer. If you cannot demonstrate more solutions after that, it shows that you clearly have no experience or problem-solving ability, which is the whole point of the interview to find out, not whether you have studied through a set of common questions.
btw, "scale up" is the second most common answer from those who can't provide better solutions. :)
re-thc 6 hours ago [-]
> and trust me, there are lots of candidates whose only answer to those is "add a cache", even after multiple follow-up questions or hints
My point isn't that the interview can't weed out bad candidates. That's in a way the easy part. The problem is it can't identify not-bad candidates.
The interview is broken because of how standardized it is. It's like a certain game genre and most people will play it the same way. It's more like a memory test.
> In an interview there is no "the answer", it's a dialogue.
It pretends to be or you assume it is. There are numerous 'tutorials' / videos / guides on system design it's >90% rehearsed. So again, my point is the interviewee is trained and will give you the standard answer even if you deviate some. There are just too many risks otherwise. If I had a more novel approach I'd risk the interviewer not understanding or taking longer than the allocated time to finish.
Especially in big tech - interviewers are trained to look for "signals" and not whether you're good or bad. They need to tick certain boxes. Even if you have a "better" answer if it's outside the box it fails.
dcow 10 hours ago [-]
At this point, what format of interview isn’t a problem?
ozim 20 hours ago [-]
I’ve seen the same, like when I just mentioned caching a team mate would hear „implement redis”.
Then I would have to explain „no, we have caching stuff ‚in process’, just use that, our app will use more RAM but that’s what we need„.
vrosas 15 hours ago [-]
I'm a fan of memcache specifically because ALL it can do is be a cache. No one can come in later and add a distributed queue to it. In-memory caching is also underrated, I agree. Using a hashmap and a minuscule TTL (like 5 seconds) can have huge performance benefits depending on your traffic, and it takes like 5 minutes to code up.
hajimuz 20 hours ago [-]
In most cases It’s not about the speed, it’s about data sharing for containers or distributed systems. Filesystem or in-memory doesn’t work. I agree that in most cases a normal database is enough though.
tombert 20 hours ago [-]
Yeah I mentioned that, if you need to share stuff between processes or differnet nodes, then maybe Redis might be a fit.
But I've seen people use Redis as a glorified "global variable" for stuff like Kafka streaming. The data is already partitioned, it's not going to be used across multiple nodes, and now you've introduced another service to look at and made everything slower because of the network. A global hashmap (or cache library, like previously mentioned) would do the job faster, with less overhead, and the code would be simpler.
Salgat 8 hours ago [-]
We use an event database (think Kafka) as our source of truth and we've largely shifted away from redis and elasticsearch in favor of local in-memory singletons. These get pretty big too, up to 6GB in some cases for a single mapping. Since it's all event based data, we can serialize the entire thing to json asynchronously along with the stream event numbers specific to that state and save the file to s3. On startup we can restore the state for all instances and catchup on the remaining few events. The best part is that the devs love being able to just use LINQ on all their "database" queries. We do however have to sometimes write these mappings to be lean to fit in memory for tens of millions of entries, such as only one property we use for a query, then we do a GET on the full object in elasticsearch.
slt2021 18 hours ago [-]
redis is needed to share data with other microservices, that are possibly written in different language.
polyglot teams when you have big data pipeline running in java, but need to share data with node/python written services.
if you dont have multiple isolated micro services, then redis is not needed
antirez 17 hours ago [-]
I believe that the issue is that the culture about Redis usage didn't evolve as much as its popularity. To use it memcached alike has many legitimate use cases but it's a very reductive way to use it. For instance sorted set ranking is something that totally changes the dynamics of what you can and can't do with traditional databases. Similarly large bitmaps that allow to retain very fast real time one bit information to do analytics otherwise very hard to do is another example. Basically, Redis helps a lot more as the company culture around it increases, more patterns are learned, and so forth. But in this regard a failure on the Redis (and my) side is that there isn't a patterns collection book: interviewing folks that handled important use cases (think at Twitter) to understand the wins and the exact usage details and data structure usages. Even just learning the writable cache pattern totally changes the dynamics of your Redis experience.
kshitij_libra 13 hours ago [-]
Do you plan to write the book ? I’d like to read it
mleonhard 7 hours ago [-]
Yes, please write it. I will buy and read it.
dinobones 21 hours ago [-]
It’s not about Redis vs not Redis, it’s about working with data that does not serialize well or lend itself well to extremely high update velocity.
Things like: counters, news feeds, chat messages, etc
The cost of delivery for doing these things well with a LSM based DB or RDB might actually be higher than Redis. Meaning: you would need more CPUs/memory to deliver this functionality, at scale, than you would with Redis, because of all the overhead of the underlying DB engine.
But for 99% of places that aren’t FAANG, that is fine actually. Anything under like 10k QPS and you can do it in MySQL in the dumbest way possible and no one would ever notice.
daneel_w 20 hours ago [-]
"But for 99% of places that aren’t FAANG, that is fine actually. Anything under like 10k QPS and you can do it in MySQL in the dumbest way possible and no one would ever notice."
It's not fine. I feel like you're really stretching it thin here in an almost hand-waving way. There are so many cases at far smaller scale where latency is still a primary bottleneck and a crucial metric for valuable and competitive throughput, where the definitively higher latency of pretty much any comparable set of operations performed in a DBMS (like MySQL) will result in large performance loss when compared to a proper key-value store.
An example I personally ran into a few years ago was a basic antispam mechanism (a dead simple rate-limiter) in a telecoms component seeing far below 10k items per second ("QPS"), fashioned exactly as suggested by using already-available MySQL for the counters' persistence: a fast and easy case of SELECT/UPDATE without any complexity or logic in the DQL/DML. Moving persistence to a proper key-value store cut latency to a fraction and more than doubled throughput, allowing for actually processing many thousands of SMSes per second for only an additional $15/month for the instance running Redis. Small operation, nowhere near "scale", huge impact to performance and ability to process customer requests, increased competitiveness. Every large customer noticed.
sgarland 20 hours ago [-]
A well-designed schema in a properly-sized-and-tuned [MySQL, Postgres] instance can and will execute point lookups in a few hundred microseconds.
That said, I agree that if you need a KV store, use a KV store. Though of course, Postgres can get you close out of the box with `CREATE UNLOGGED TABLE (data hstore);`.
lmm 11 hours ago [-]
> processing many thousands of SMSes per second for only an additional $15/month for the instance running Redis. Small operation, nowhere near "scale", huge impact to performance and ability to process customer requests
The vast majority of companies never need to deal with even one thousand of anything per second. Your situation was absolutely an unusually large scale.
cess11 19 hours ago [-]
I'm sure something other than the MySQL engine itself was the bottleneck in that case, like bad configuration or slow disk or something.
Did you profile the issue?
daneel_w 19 hours ago [-]
Unreplicated MEMORY tables, prepared and cached statements, efficient DDL and sane indices, no contention or locking, no access from multiple sessions, some performance tuning of InnoDB, ample resources, DB not stressed, no difference in pure network latency.
MySQL's query optimizer/planner/parser perform a lot more "gyrations" than Redis or MemcacheDB do before finally reaching the point of touching the datastore to be read/written, even in the case of prepared statements. Their respective complexities are not really comparable.
packetlost 21 hours ago [-]
I've only ever seen Redis used in two scenarios: storing ephemeral cache data to horizontally scale Django applications and for ephemeral job processing where the metadata about the job was worthless.
I reevaluated it for a job processing context a couple of years ago and opted for websockets instead because what I really needed was something that outlived an HTTP timeout.
I've never actually seen it used in a case where it wasn't an architecture smell. The codebase itself is pretty clean and the ideas it has are good, but the idea of externalizing datastructures like that just doesn't seem that useful if you're building something correctly.
ljm 20 hours ago [-]
Redid + Sidekiq was a default for a long time in the Rails world as well, but it’s an unnecessary complication (and expense) for most use cases. Just use your existing DB until you need to seriously scale up, and then look at a message queue.
I’ve used Redis for leaderboards and random matchmaking though, stuff which is doable in postgres but is seriously write-heavy and a bit of a faff. Gives you exactly the sort of goodies you need on top of a K/V store without being difficult to set up.
As for caching - it’s nice to use as an engineer for sure, but pretty pricey. It wouldn’t be my default choice any more.
bdcravens 20 hours ago [-]
Rails is attempting to solve this with Solid Queue, which was inspired heavily by GoodJob, both of which use Postgresql (and more in the case of Solid Queue). Both seem to be fairly capable of "serious scale", at least being equivalent to Sidekiq.
briandear 13 hours ago [-]
A place I work at which I can’t name uses GoodJob at FAANG scale. And it works perfectly. The small startups and lower scale places still reach for sidekiq because they seem to think “it’s faster,” but it ends up being a nightmare for many people because when they do start reaching some scale, their queues and infra are so jacked up that they continually have sidekiq “emergencies.” GoodJob (and SolidQueue) for the win.
I like the sidekiq guy and wish him the best, but for me, the ubiquitous Redis dependency on my Rails apps is forever gone. Unless I actually need a KV store, but even for that, I can get away with PG and not know the difference.
Unfortunately there are still some CTOs out there that haven’t updated their knowledge are are still partying like it’s 2015.
alabastervlog 20 hours ago [-]
Using Redis exclusively remotely never made much sense to me. I get it as a secondary use case (gather stats from a server that’s running Redis, from another machine or something) but if it’s not acting as (effectively) structured, shared memory on a local machine with helpful coordination features, I don’t really get it. It excels at that, but all this Redis as a Service stuff where it’s never on the same machine as any of the processes accessing it don’t make sense to me.
Like you have to push those kinds of use cases if you’re trying to build a business around it, because a process that runs on your server with your other stuff isn’t a SaaS and everyone wants to sell SaaS, but it’s far enough outside its ideal niche that I don’t understand why it got popular to use that way.
pdimitar 20 hours ago [-]
To your last: yep, especially having in mind that Redis is ephemeral. I've had much more success with SQLite + a bunch of stricter validators (as SQLite itself is sadly pretty loose), and more performance too.
gytisgreitai 20 hours ago [-]
Exactly. Lots of people read post by companies doing millions of qps and then decide that they need redis, kafka, elastic, nosql, etc right from start. And that complicates things.
We are currently at 500k RPS scale and we have probably around a handful of use cases for Redis and it works great
hinkley 20 hours ago [-]
I worked for a company that had enough customers that AWS had to rearrange their backlog for cert management to get us to come on board, and our ingress didn’t see 10,000 req/s. We put a KV store in front of practically all of our backend services though. We could have used Redis, but memcached was so stable and simple that we just manually sharded by service. We flew too close to the sun trying to make the miss rate in one of the stores a little lower and got bit by OOMKiller.
By the time it was clear we would have been better off with Redis’ sharding solution the team was comfortable with the devil they knew.
lr4444lr 20 hours ago [-]
100% this. Also, is it data whose scale and speed is more important than its durability?
I actually agree with the author that Redis was not the right solution for the situations he was presented with, but he's far from proving it is not the solution for a whole host of other problems.
karmakaze 20 hours ago [-]
Even then you can do a lot of things to spread write contention with an RDBMS.
e.g. MySQL 8.0.1+ adds SKIP LOCKED modifier to SELECT ... FOR UPDATE.
Then you can increment the first available row, otherwise insert a new row. On read aggregate the values.
0xbadcafebee 17 hours ago [-]
Software development today is largely just people repeating what other people do without thinking. Which is how human culture works; we just copy what everyone else is doing, because it's easier, and that becomes "normal", whatever it is.
In the software world in the mid 00's, the trend started to work around the latency, cost and complexity of expensive servers and difficult databases by relying on the speed of modern networks and RAM. This started with Memcached and moved on to other solutions like Redis.
(this later evolved into NoSQL, when developers imagined that simply doing away with the complexity of databases would somehow magically remove their applications' need to do complex things... which of course it didn't, it's the same application, needing to do a complex thing, so it needs a complex solution. computers aren't magic. we have thankfully passed the hype cycle of NoSQL, and moved on to... the hype cycle for SQLite)
But the tradeoff was always working around one limitation by adding another limitation. Specifically it was avoiding the cost of big databases and the expertise to manage them, and accepting the cost of dealing with more complex cache control.
Fast forward to 2025 and databases are faster (but not a ton faster) and cheaper (but not a ton cheaper) and still have many of the same limitations (because dramatically reinventing the database would have been hard and boring, and no software developer wants to do hard and boring things, when they can do hard and fun things, or ignore the hard things with cheap hacks and pretend there is no consequence to that).
So people today just throw a cache in between the database, because 1) databases are still kind of stupid and hard (very very useful, but still stupid and hard) and 2) the problems of cache complexity can be ignored for a while, and putting off something hard/annoying/boring until later is a human's favorite thing.
No, you don't need Redis. Nobody needs Redis. It's a hack to avoid dealing with stateless applications using slow queries on an un-optimized database with no fast read replicas and connection limits. But that's normal now.
cmbothwell 7 hours ago [-]
This hits at the true nature of the problem which has _nothing_ to do with Redis at all (which is a fine piece of technology written by a thoughtful and conscientious creator) and has everything to do with the fact that our industry at large encourages very little thinking about the problems we are trying to solve.
Hence, fads dominate. I hate to sound so cynical but that has been my experience in every instance of commercial software development.
edoceo 12 hours ago [-]
> hype cycle for SQLite
Drop Redis, replace with in-memory SQLite.
But for real, the :memory: feature is actually pretty awesome!
paulryanrogers 1 hours ago [-]
That won't help if you need a centralized cache or centralized KV store, like for sessions.
briandear 13 hours ago [-]
Recently just left a MongoDB project. A total nightmare.
bassp 20 hours ago [-]
I agree with the author 100% (the TanTan anecdote is great, super clever work!), but.... sometimes you do need Redis, because Redis is the only production-ready "data structure server" I'm aware of
If you want to access a bloom filter, cuckoo filter, list, set, bitmap, etc... from multiple instances of the same service, Redis (slash valkey, memorydb, etc...) is really your only option
It also has arrays, sets, and bitstrings, though for the latter you can just as easily (and with less space consumed) map it in your app, and store an integer.
jasonthorsness 20 hours ago [-]
Yes, while the default idea of Redis might be to consider it a key/value cache, the view of the project itself is definitely about being a "data structure server" - it's right at the top of the https://github.com/redis/redis/blob/unstable/README.md and antirez has focused on that (I can't find one quote I am looking for specifically but it's evident for example in discussion on streams https://antirez.com/news/114). Although I've definitely seen it be used just as a key/value store in the deployments I'm familiar with ¯\_(ツ)_/¯
e_hup 19 hours ago [-]
All of those can be serialized and stored in an RDMS. You don't need Redis for that.
bassp 19 hours ago [-]
They can (and that's probably the right choice for a lot of use cases, especially for small data structures and infrequently updated ones), but serializing and storing them in a database requires you to (in your application code) implement synchronization logic and pay the performance cost for said logic; for instance, if you want to `append` to a shared list, you need to deserialize the list, append to the end of it in your application code, and write it back to the DB. You'd need use some form of locking to prevent appends from overwriting each other, incurring a pretty hefty perf penalty for hot lists. Also, reading an entire list/tree/set/whatever back just to add/delete one element is very wasteful (bandwidth/[de]serialization cost-wise)
evil-olive 18 hours ago [-]
> for instance, if you want to `append` to a shared list, you need to deserialize the list, append to the end of it in your application code, and write it back to the DB.
this seems like a classic case of impedance mismatch, trying to implement a Redis-ism using an RDBMS.
for a shared list in a relational database, you could implement it like you've said, using an array type or a jsonb column or whatever, and simulate how it works in Redis.
but to implement a "shared list" in a way that meshes well with the relational model...you could just have a table, and insert a row into the table. there's no need for a read-modify-write cycle like you've described.
or, if you really need it to be a column in an existing table for whatever reason, it's still possible to push the modification to the database without the heavy overhead. for example [0]:
> The concatenation operator allows a single element to be pushed onto the beginning or end of a one-dimensional array. It also accepts two N-dimensional arrays, or an N-dimensional and an N+1-dimensional array.
Sure, but that’s not what the person responding to my original comment was suggesting :). They suggested that you serialize entire data structures (bloom filters, lists, sets, etc…) into a relational DB to get redis-like functionality out of it; I chose a list as an example to illustrate why that’s not a great option in many cases.
You’re right that managing lists in RDMSes is easy-ish, if you don’t have too many of them, and they’re not too large. But, like I mentioned in my original comment, redis really shines as a complex data structure server. I wouldn’t want to implement my own cuckoo filter in Postgres!
kflgkans 19 hours ago [-]
You might not need a cache. In my previous company (~7 years) all teams around me were introducing caches left and right and getting into a lot of complexity and bugs. I persevered and always pushed back adding caches to apps in my team. Instead focusing on improving the architecture and seeking other performance improvements. I can proudly say my teams have stayed cached-free for those 7 years.
superq 18 hours ago [-]
The issues that I have with Redis are not at all its API (which is elegant and brilliant) or even its serialized, single-core, single-threaded design, but its operational hazards.
As a cache or ephemeral store like a throttling/rate limiting, lookup tables, or perhaps even sessions store, it's great; but it's impossible to rely on the persistence options (RDB, AOF) for production data stores.
You usually only see this tendency with junior devs, though. It might be a case where "when all you have is a hammer, all you see are nails", or when someone discovers Redis (or during the MongoDB hype cycle ten years ago), which seems like it's in perfect alignment with their language datatypes, but perhaps this is mostly because junior devs don't have as many production-ready databases (from SQL like Postgresql, CockroachDB, Yugabyte to New/NoSQL like ScyllaDB, YDB, Aerospike) to fall back on.
Redis shines as a cache for small data values (probably switch to memcache for larger values, which is simpler key-value but generally 3 to 10 times faster for that more narrow use case, although keep an eye on memory fragmentation and slab allocation)
Just think carefully before storing long-term data in it. Maybe don't store your billing database in it :)
noisy_boy 18 hours ago [-]
I have seen horrifying use of Redis where I inherited the maintainance of an application whose original developer implemented his own home grown design to manage relationships between different types of key value pairs, pretending like they were tables including cross-referencing logic; it took me a week to just add test cases with sufficient logging to reveal the "schema" and mutation logic. All this with the non-technical manager wondering why it took so long to make the change which directly dependended on understanding this. To top it all, the code was barely better than spaghetti with less than ten lines of comments across maybe 5k LOC. The irony was that this was not a latency sensitive application - it did data quality checks and could have been implemented in a much more cleaner and flexible way using, e.g., PostgreSQL.
intelVISA 14 hours ago [-]
Non-technical 'manager'
dimgl 12 hours ago [-]
I'm really surprised that the pendulum has swung so far in the other direction that people are recommending not to use Redis.
Sure, don't introduce a data store into your stack unless you need it. But if you had to introduce one, Redis still seems like one of the best to introduce? It has fantastic data structures (like sorted sets, hash maps), great performance, robust key expiry, low communication overhead, low runtime overhead... I mean, the list goes on.
esafak 12 hours ago [-]
I'd like to draw attention to its probabilistic data structures, in particular: HyperLogLog, Bloom filter, Cuckoo filter, t-digest, Top-K, Count-min sketch
Yeah, my first thought was, “I don’t need redis, but I want redis”.
progbits 19 hours ago [-]
Redis as ephemeral cache is ok, but nothing extra.
Redis as transactional, distributed and/or durable storage is pretty poor. Their "active active" docs on conflict resolution for example don't fill me with confidence given there is no formalism, just vague examples. But this comes from people who not only not know how do distributed locks, they refuse to learn when issues are pointed out to them: https://martin.kleppmann.com/2016/02/08/how-to-do-distribute...
Every time I find code that claims to do something transactional in Redis which is critical for correctness, not just latency optimization, I get worried.
igortg 20 hours ago [-]
I followed with this rationale in a small project and opted for PostgreSQL pub/sub instead Redis. But I went through so much trouble correctly handling PostgreSQL disconnections that I wonder if Redis wouldn't be the better choice.
alberth 15 hours ago [-]
> A single beefy Redis (many cores, lots of RAM type of machine) should be able to handle the load
I thought Redis was single threaded running on a single core.
Having multiple cores provides no benefit (and arguably could hurt since large multicore systems typically have a lower clock)
sriku 14 hours ago [-]
Redis itself, yes. However there are Redis compatible tools that go beyond single core - KeyDB, DragonFly and ARDB overcome some limitations.
bdcravens 20 hours ago [-]
Another category is using Redis indirectly via dependencies. For example in Rails, Sidekiq is a common background job library. However, there are now Postgresql-backed options (like GoodJob and the baked-in Solid Queue, which supports other RDBMSes as well)
jherdman 19 hours ago [-]
I just wanted to take a moment to highlight both GoodJob and Solid Queue. They're excellent choices, and any RoR folks reading these comments should give them a fair shake.
khaki54 17 hours ago [-]
I was reading an article last week where the author did a bunch of testing and found Redis is only 3-5x faster than just using a multi model NoSQL. Basically it rarely makes sense to use Redis.
https://adamfowler.org/2015/09/07/hybrid-nosql-marklogic-as-...
dcmatt 17 hours ago [-]
"only 3-5x faster... rarely makes sense to use Redis" This is an illogical conclusion. 3-5x faster is substantial!
lucb1e 9 hours ago [-]
I agree, although it also depends on the context. If a page needs to fetch lots of data and takes 3 full seconds to load, the middle case (4x improvement) makes it go from an amount of time where you glance at your messages to on par with the average javascript-laden website we're used to anyway (except in this case it's legitimate loading time). If, however, we're talking about most pages, that are mostly static and just fetch some article contents and author name from the DB in 2 milliseconds... yeah, it can be 10x faster and nobody cares
mannyv 20 hours ago [-]
Caching is a funny thing. Just like anything you need to understand if and why you need it.
And one other thing is you should be able to fall back if your cache is invalidated.
In our case we keep a bunch of metadata in redis that’s relatively expensive (in cost and speed) to pull/walk in realtime. And it needs to be fast and support lots of clients. The latter sort of obviates direct-to-database options.
simonw 15 hours ago [-]
If you work at a company where teams keep on turning to Redis for different features, there's a chance that it's an indication that the process for creating new database tables in your relational store has too much friction!
sgarland 20 hours ago [-]
> 'Why can't we just store this data on the shards in PostgreSQL, next to the swipes?'. The data itself would be microscopic in comparison and the additional load would also be microscopic in comparison to what these servers were already doing.
I'm assuming based on the rest of the article that the author and team knew what they were doing, but if you aren't familiar with Postgres' UPDATE strategy and HOT updates [0], you should familiarize yourself before attempting this, otherwise, you're going to generate a massive amount of WAL traffic and dead tuples at scale.
I think people forget (or don’t know) that adding data storage system to your architecture also involves management, scaling, retention, and backup. It’s not free.
And second, sometimes you do need to invest in storage to permit failover or to minimize latency but people do it for traffic when they really have little traffic. A server from 20 years ago could push a lot of traffic and systems have gotten only beefier.
notjoemama 16 hours ago [-]
> The first usage was to cache the result of an external API call to lookup geolocation of users, so that the service in question could process location requests quicker, and also run cheaper.
BAAAHHH HAAHA HA HA HA HHHAA. Ok, look. Just because it's in Redis does not disqualify the clause in the geo service's license that you NOT permanently store the data. The author did not say that's what they were doing but a very similar thing came up in a previous work place for me and we chose to call the service for every lookup the way the service company expected their customers to do. I recall the suggestion in using a caching database as a workaround and it was not made by the engineering team. Sorry, I'm still chuckling at this one...
inib 15 hours ago [-]
We use Redis to store geocoding data from Google Geocoding API with a TTL of 30 days, per the policy. It's $4/1000 requests. There'd no business without the cache.
dimgl 12 hours ago [-]
I don't understand this comment. Without caching, you'd be at the mercy of the geolocation company. I've deployed a service that does exactly this.
lukaslalinsky 19 hours ago [-]
For years I've tried to use Redis as a persistent data store. I've only been dissatisfied, having bad experiences with both sentinel and cluster. Most of my service outages were linked to Redis replication getting broken.
Then I decided to give up and use it only as an empehemral cache. I have a large number of standalone Redis instances (actually, now they are Valkey), no storage, only memory, and have Envoy proxy on top of them for monitoring and sharding. And I'm really happy with it, storing hundreds of GBs of data there, if one goes down, only a small part of the data needs to be reloaded from the primary source, and with the Envoy proxy, applications see it as a single Redis server. I was considering just replacing it with memcached, but the redis data model is more rich, so I kept using it, just not expecting anything put there to be actually stored forever.
anonymousDan 14 hours ago [-]
What were the problems you had when using it as a persistent data store if you don't mind me asking? Also, what kind of workloads did you have?
lukaslalinsky 7 hours ago [-]
The problematic use case was using Redis as a counter for various app related metrics, which I want to make visible to the customer later on. App was writing to Redis, background job was reading these partially aggregated metrics from Redis and writing them to another database. So you could say it was a remote buffer. On way too many occasions, after short failure of the host where the current master instance was running, the replica was not propagated to master. That was using redis sentinel. I guess my main issue was always redis sentinel.
alberth 14 hours ago [-]
This post seems less about Redis, and more broadly about - why you should never introduce new technologies into your existing stack (only do so when your existing stack can’t keep up).
vvpan 19 hours ago [-]
Really the most useful feature in Redis to me is TTL - so easy to cache things and not worry about them being too stale. The second most useful is that you can attach something like BullMQ to it and get cron/scheduling for a couple lines of code.
sorokod 21 hours ago [-]
So premature optimization causes Redis?
21 hours ago [-]
rc_mob 16 hours ago [-]
Yeah I just removed redis from my php app. It runs slightly slower but not enough that my user would notice. Redis was overkill
That's a great project! I'm definitely going to give it a try.
phendrenad2 19 hours ago [-]
I like to run a single redis server process on each instance that's serving the webapp, and also have a shared redis cluster for shared cached state (like user sessions). That way there's a fast local cache for people to use for caching API calls and other things where duplication is preferable to latency.
tengbretson 19 hours ago [-]
If it's purely ephemeral and not replicated between instances what are you getting from redis that couldn't be done with SQLite or even a good old hashmap in a separate thread?
phendrenad2 18 hours ago [-]
Redis is probably slightly faster than sqlite, and developers are used to using redis primitives and APIs.
ricardobeat 17 hours ago [-]
You get standardized data structures and consistent operations on them for free.
20 hours ago [-]
20 hours ago [-]
dorianniemiec 20 hours ago [-]
I have found that for websites that use Joomla, file cache is faster than Redis cache, at least for me.
lucb1e 9 hours ago [-]
What data is cached in there?
And you're almost certainly still caching it to RAM, not sure if you're aware that the kernel keeps in-memory copies of recently-read/written pieces of files
jayd16 20 hours ago [-]
I agree that Redis can be an architectural smell. You can shoehorn it in where it doesn't belong.
It's easy to work with and does it's job so well that for a certain type of dev it's easier to implement caching than understand enough SQL to add proper indices.
tengbretson 19 hours ago [-]
Its funny, because once you go through the effort of determining which query fields you should compose to make your cache key, you're 95% of the way to the index you should have had in the first place.
briandear 13 hours ago [-]
Rails adding SolidCache and SolidQueue (or Good Job) was what made me not even think of Redis anymore in the rails world.
cyberax 14 hours ago [-]
Redis (or Valkey) has a bunch of functionality other than caching. It has a very useful event streaming system and task queueing system.
You can use Postgres instead of them, but then it can easily become a bottleneck.
xyzzy9563 19 hours ago [-]
Redis is useful if you have multiple backend servers that need to synchronize state. If you don’t have this problem then improving your caching I.e. using Cloudflare is probably the best approach, along with local caching on a single server.
evidencetamper 21 hours ago [-]
> Redis is arguably the most well regarded tech of all.
REDIS tradeoffs have been so widely discussed because many, many engineers disagree with them. REDIS is so lowly regarded that some companies ban it completely, making their engineers choose between memcached or something more enterprisey (hazelcast these days, Coherence some time ago).
virtue3 21 hours ago [-]
Redis is probably one of the most well written pieces of software I’ve ever looked at.
The source code is magnificent.
foobarian 19 hours ago [-]
I second this, specifically the geohash code is a work of art.
pphysch 20 hours ago [-]
The original maintainer went on to write a sci-fi novel
do_not_redeem 21 hours ago [-]
This comment has a lot of words but says nothing. "Many engineers" disagree with every piece of software, and "some companies" ban every piece of software.
Why are you writing Redis in all caps like it's an acronym? Reminds me of those old dog C programmers who write long rants about RUST with inaccuracies that belie the fact they've never actually used it.
karmakaze 20 hours ago [-]
If anything it's ReDiS, Remote Dictionary Server. Also pronounced to sound like 'remote', unlike the color of its logo (which would be spelled Reddis).
Xiol32 20 hours ago [-]
Same vibes as people who write SystemD.
throwaway7783 21 hours ago [-]
Why is Redis regarded so lowly?
Now, Redis Search, I understand. Very buggy, search index misses records and doesn't tell that it missed them and so on.
Core Redis is a solid piece of technology, afaik
Joel_Mckay 20 hours ago [-]
Probably the same reasons dynamically-scaled Aerospike is not as popular. Redis primarily has a reputation of just "caching" stuff in memory, but probably the rates of CVE in the distributed clients soured some peoples opinions.
Distributed systems is hard, and what seems like a trivial choice about design constraints can quickly balloon in cost. =3
tayo42 20 hours ago [-]
I work at a company that banned redis after an outage. kind of blunt, it feels like such an amateur decision to have made. So reactionary instead of trying to learn it. It's large popular internet company too
secondcoming 20 hours ago [-]
Was it in the early days of redis?
I have a dedicated server that's running an instance of redis on each of the 32 cores and it's been up for over a year now. Each core is doing about 30k QPS
21 hours ago [-]
Rendered at 14:23:26 GMT+0000 (UTC) with Wasmer Edge.
It turned out Uber engineers just loved Redis. Having a need to distribute your work? Throw that to Redis. I remember debating with some infra engineers why we couldn't throw in more redis/memcached nodes to scale our telemetry system, but I digressed. So, the price service we built was based on Redis. The service fanned out millions of requests per second to redis clusters to get information about individual hexagons of a given city, and then computed dynamic areas. We would need dozens of servers just to compute for a single city. I forgot the exact number, but let's say it was 40 servers per an average-sized city. Now multiply that by the 200+ cities we had. It was just prohibitively expensive, let alone that there couldn't other scalability bottlenecks for managing such scale.
The solution was actually pretty simple. I took a look at the algorithms we used, and it was really just that we needed to compute multiple overlapping shapes. So, I wrote an algorithm that used work-stealing to compute the shapes in parallel per city on a single machine, and used Elasticsearch to retrieve hexagons by a number of attributes -- it was actually a perfect use case for a search engine because the retrieval requires boolean queries of multiple attributes. The rationale was pretty simple too: we needed to compute repetitively on the same set of data, so we should retrieve the data only once for multiple computations. The algorithm was of merely dozens of lines, and was implemented and deployed to production over the weekend by this amazing engineer Isaac, who happens to be the author of the library H3. As a result, we were able to compute dynamic areas for 40 cities, give or take, on a single machine, and the launch was unblocked.
Implementing Google's S2 is simpler, but it has the same overall benefits as H3 such as a hierarchical data structure.
Uber internally had extensive research on what kind of grid system to use. In fact, we started with S2 and geo-hash, but H3 is superior. Long story short, hexagons are like discretized circles, and therefore offer more symmetry than S2 cells[1]. Consequently, hexagons offer more uniform shapes when we compose hierarchical structures. Besides, H3 cells have more consistent sizes in different latitudes, which is very important for uber to compute supply and demand of cars.
[1] One of the complications is that H3 has to have pentagons to tile the entire world, just like a soccer ball. We can easily see why by Euler's characteristic formula.
For anyone doing geo queries it's a powerful tool.
Using it, you're introducing network latency and serialization overhead. Sometimes that's worth it, especially if your database is falling over, but a lot of the time people use it and it just makes everything more complex and worse.
If you need to share cached data across processes or nodes, sometimes you have to use it, but a lot of the stuff I work with is partitioned anyway. If your data is already partitioned, you know what works well a lot of the time? A boring, regular hashmap.
Pretty much every language has some thread-safe hashmap in there, and a lot of them have pretty decent libraries to handle invalidation and expiration if you need those. In Java, for example, you have ConcurrentHashMap for simple stuff, and Guava Caches or Caffeine Caches for more advanced stuff.
Even the slowest [1] local caching implementation will almost certainly be faster than anything that hits the network; in my own testing [2] Caffeine caches have sub-microsecond `put` times, and you don't pay any serialization or deserialization cost. I don't think you're likely to get much better than maybe sub-millisecond times with Redis, even in the same data center, not to mention if you're caching locally that's one less service that you have to babysit.
Again, I don't hate Redis, there are absolutely cases where it's a good fit, I just think it's overused.
[1] Realistic I mean, obviously any of use could artificially construct something that is slow as we want.
[2] https://blog.tombert.com/posts/2025-03-06-microbenchmark-err... This is my own blog, feel free to not click it. Not trying to plug myself, just citing my data.
There’s nothing worse than when someone does the latter. I had to write a tool to remove deletes from the AOF log because someone fucked up ordering of operations big time trying to pretend they had proper transactions.
I'm using redis only for temp state data like a session (when I can't use a jwt).
Or when I have to scale and need a warmed up cache
Is that bad now?
I'm also wondering right now why there is no local cache with p2p self discovery and sync. Should be easier than deploying an extra piece of software.
Why not use a regular database for this (can be as simple as an sqlite file, depending on your needs), or the default thingy that comes with your framework or programming language? This is built into everything I've ever used, no need to reinvent session storage or overengineer the situation with jwt or some other distributed cryptographic system and key management
Ah but in trendy microservices world, it isn’t in many micro frameworks, you have to reinvent it
The whole design space for this type of API is weirdly under-explored, but there are some well-supported mainstream solutions out there.
Fundamentally, Redis ought to be a NuGet library, a Rust crate, or something like it. It's just a distributed hash table, putting it onto its own servers is a bit bizarre if the only need is caching.
Microsoft's Service Fabric platform and the Orleans library both implement distributed hash tables as fundamental building blocks. Both can trivially be used "just" as a cache to replace Redis, and both support a relatively rich set of features if you need more advanced capabilities.
Of course, there's Scala's Akka and the Akka.NET port also.
It is JVM based "shared cache" so can be used to transparently share results of expensive queries - but also to share sessions. It mostly just works but the free version have some issues when one upgrade data models.
I know half the people here probably loathe JVM but once one is aware of one implementation I guess it should be possible to find similar things for .Net and maybe also go and Python.
Microsoft could do better than that!
For example, Azure App Service could use an out-of-process shared cache feature so that web apps could have local low-latency caches that survive app restarts.
The thing that bothers me is people adding it in places that don't make sense; I mentioned in a sibling thread that the I've seen people use it as a glorified global variable in stuff like Kafka streaming. Kafka's stuff is already partitioned, you likely don't gain anything from Redis compared to just keeping a local map, and at that point you can just use a Guava Cache and let it handle invalidation in-process.
But that doesn’t work for caching non trivial calculations or intermediate state. There’s a sweet spot for transitory persistence.
You could throw a bunch of your production data in SSAS tabular and there you go you have an in memory cache. I've actually deployed that as a solution and the speed is crazy.
You could store the key->version separately, and read the said version. If the cached version is lower, it's a cache miss.
Of course, evicting something from cache (due to memory constraints) is a bit harder (or less efficient) in such setup.
for example, if the problem we're talking about is related to slow _writes_, not slow reads, the typical usage of a cache isn't going to help you at all. implementing write-through caching is certainly possible, but has additional pitfalls related to things like transactional integrity between your cache and your authoritative data store.
Could be worse: you could have met me! I used to laugh at caching and thought that if your website is so slow that you need a caching layer (Wordpress comes to mind), you're just doing it wrong: perhaps you're missing indexes on your database or you simply can't code properly and made it more complex than necessary (I was young, once). Most of my projects are PHP scripts invoked by Apache, so they have no state and compute everything fresh. This is fine (think <30ms typical page generation time) for 95% of the types of things I make, but in more recent years I had two projects where I really struggled with that non-pragmatic mentality and spent long hours experimenting with different writing strategies (so data wouldn't change as often and MariaDB's built-in optimizations better), indexes on low-cardinality columns, indexes on combined columns in specific orders, documenting with each query which index it requires and maps to, optimizing the query itself of course, in one experiment writing my own on-disk index file to search through some gigabytes of data much faster than the database seemed to be able to do for geospatial information, upgraded the physical hardware from HDD to SSD...
Long story short, I now run Redis and the website is no longer primarily bound by computation power but, instead, roughly equally by bandwidth
I'm still very wary of introducing Redis to projects lest I doom them: it'll inevitably outgrow RAM if I indiscriminately stick things in there, which means turning them off (so far, nearly no links or tools on my website ever turned 404 because they're all on a "keep it simple" WAMP/LAMP stack that can do its thing for many years, perhaps search-and-replacing something like mysql_query() with mysqli->query() every five years but that's about the extent of the maintenance)
So anyway I think we're in agreement about "apply where appropriate" but figured I'd share the counter-example of how one can also be counterproductive in the other direction and that there is something to be said for the pragmatic people that consider/try a cache, which often does help even if there's often a different underlying problem and my perfectionism wouldn't like it
Then when you lose a cache node, the DB gets slammed and falls over, because when the DB team implemented service-based rate-limiting, the teams cried that they were violating their SLOs so the rate limits were bumped waaaay up.
It's an interview though. Most people just watch youtube videos and "copy and paste" the answer.
In a way it's the format of the interview that's the problem. Similar to leet code style interviews a lot of the times we're not checking for what we need.
btw, "scale up" is the second most common answer from those who can't provide better solutions. :)
My point isn't that the interview can't weed out bad candidates. That's in a way the easy part. The problem is it can't identify not-bad candidates.
The interview is broken because of how standardized it is. It's like a certain game genre and most people will play it the same way. It's more like a memory test.
> In an interview there is no "the answer", it's a dialogue.
It pretends to be or you assume it is. There are numerous 'tutorials' / videos / guides on system design it's >90% rehearsed. So again, my point is the interviewee is trained and will give you the standard answer even if you deviate some. There are just too many risks otherwise. If I had a more novel approach I'd risk the interviewer not understanding or taking longer than the allocated time to finish.
Especially in big tech - interviewers are trained to look for "signals" and not whether you're good or bad. They need to tick certain boxes. Even if you have a "better" answer if it's outside the box it fails.
Then I would have to explain „no, we have caching stuff ‚in process’, just use that, our app will use more RAM but that’s what we need„.
But I've seen people use Redis as a glorified "global variable" for stuff like Kafka streaming. The data is already partitioned, it's not going to be used across multiple nodes, and now you've introduced another service to look at and made everything slower because of the network. A global hashmap (or cache library, like previously mentioned) would do the job faster, with less overhead, and the code would be simpler.
polyglot teams when you have big data pipeline running in java, but need to share data with node/python written services.
if you dont have multiple isolated micro services, then redis is not needed
Things like: counters, news feeds, chat messages, etc
The cost of delivery for doing these things well with a LSM based DB or RDB might actually be higher than Redis. Meaning: you would need more CPUs/memory to deliver this functionality, at scale, than you would with Redis, because of all the overhead of the underlying DB engine.
But for 99% of places that aren’t FAANG, that is fine actually. Anything under like 10k QPS and you can do it in MySQL in the dumbest way possible and no one would ever notice.
It's not fine. I feel like you're really stretching it thin here in an almost hand-waving way. There are so many cases at far smaller scale where latency is still a primary bottleneck and a crucial metric for valuable and competitive throughput, where the definitively higher latency of pretty much any comparable set of operations performed in a DBMS (like MySQL) will result in large performance loss when compared to a proper key-value store.
An example I personally ran into a few years ago was a basic antispam mechanism (a dead simple rate-limiter) in a telecoms component seeing far below 10k items per second ("QPS"), fashioned exactly as suggested by using already-available MySQL for the counters' persistence: a fast and easy case of SELECT/UPDATE without any complexity or logic in the DQL/DML. Moving persistence to a proper key-value store cut latency to a fraction and more than doubled throughput, allowing for actually processing many thousands of SMSes per second for only an additional $15/month for the instance running Redis. Small operation, nowhere near "scale", huge impact to performance and ability to process customer requests, increased competitiveness. Every large customer noticed.
That said, I agree that if you need a KV store, use a KV store. Though of course, Postgres can get you close out of the box with `CREATE UNLOGGED TABLE (data hstore);`.
The vast majority of companies never need to deal with even one thousand of anything per second. Your situation was absolutely an unusually large scale.
Did you profile the issue?
MySQL's query optimizer/planner/parser perform a lot more "gyrations" than Redis or MemcacheDB do before finally reaching the point of touching the datastore to be read/written, even in the case of prepared statements. Their respective complexities are not really comparable.
I reevaluated it for a job processing context a couple of years ago and opted for websockets instead because what I really needed was something that outlived an HTTP timeout.
I've never actually seen it used in a case where it wasn't an architecture smell. The codebase itself is pretty clean and the ideas it has are good, but the idea of externalizing datastructures like that just doesn't seem that useful if you're building something correctly.
I’ve used Redis for leaderboards and random matchmaking though, stuff which is doable in postgres but is seriously write-heavy and a bit of a faff. Gives you exactly the sort of goodies you need on top of a K/V store without being difficult to set up.
As for caching - it’s nice to use as an engineer for sure, but pretty pricey. It wouldn’t be my default choice any more.
I like the sidekiq guy and wish him the best, but for me, the ubiquitous Redis dependency on my Rails apps is forever gone. Unless I actually need a KV store, but even for that, I can get away with PG and not know the difference.
Unfortunately there are still some CTOs out there that haven’t updated their knowledge are are still partying like it’s 2015.
Like you have to push those kinds of use cases if you’re trying to build a business around it, because a process that runs on your server with your other stuff isn’t a SaaS and everyone wants to sell SaaS, but it’s far enough outside its ideal niche that I don’t understand why it got popular to use that way.
By the time it was clear we would have been better off with Redis’ sharding solution the team was comfortable with the devil they knew.
I actually agree with the author that Redis was not the right solution for the situations he was presented with, but he's far from proving it is not the solution for a whole host of other problems.
e.g. MySQL 8.0.1+ adds SKIP LOCKED modifier to SELECT ... FOR UPDATE.
Then you can increment the first available row, otherwise insert a new row. On read aggregate the values.
In the software world in the mid 00's, the trend started to work around the latency, cost and complexity of expensive servers and difficult databases by relying on the speed of modern networks and RAM. This started with Memcached and moved on to other solutions like Redis.
(this later evolved into NoSQL, when developers imagined that simply doing away with the complexity of databases would somehow magically remove their applications' need to do complex things... which of course it didn't, it's the same application, needing to do a complex thing, so it needs a complex solution. computers aren't magic. we have thankfully passed the hype cycle of NoSQL, and moved on to... the hype cycle for SQLite)
But the tradeoff was always working around one limitation by adding another limitation. Specifically it was avoiding the cost of big databases and the expertise to manage them, and accepting the cost of dealing with more complex cache control.
Fast forward to 2025 and databases are faster (but not a ton faster) and cheaper (but not a ton cheaper) and still have many of the same limitations (because dramatically reinventing the database would have been hard and boring, and no software developer wants to do hard and boring things, when they can do hard and fun things, or ignore the hard things with cheap hacks and pretend there is no consequence to that).
So people today just throw a cache in between the database, because 1) databases are still kind of stupid and hard (very very useful, but still stupid and hard) and 2) the problems of cache complexity can be ignored for a while, and putting off something hard/annoying/boring until later is a human's favorite thing.
No, you don't need Redis. Nobody needs Redis. It's a hack to avoid dealing with stateless applications using slow queries on an un-optimized database with no fast read replicas and connection limits. But that's normal now.
Hence, fads dominate. I hate to sound so cynical but that has been my experience in every instance of commercial software development.
Drop Redis, replace with in-memory SQLite.
But for real, the :memory: feature is actually pretty awesome!
If you want to access a bloom filter, cuckoo filter, list, set, bitmap, etc... from multiple instances of the same service, Redis (slash valkey, memorydb, etc...) is really your only option
It also has arrays, sets, and bitstrings, though for the latter you can just as easily (and with less space consumed) map it in your app, and store an integer.
this seems like a classic case of impedance mismatch, trying to implement a Redis-ism using an RDBMS.
for a shared list in a relational database, you could implement it like you've said, using an array type or a jsonb column or whatever, and simulate how it works in Redis.
but to implement a "shared list" in a way that meshes well with the relational model...you could just have a table, and insert a row into the table. there's no need for a read-modify-write cycle like you've described.
or, if you really need it to be a column in an existing table for whatever reason, it's still possible to push the modification to the database without the heavy overhead. for example [0]:
> The concatenation operator allows a single element to be pushed onto the beginning or end of a one-dimensional array. It also accepts two N-dimensional arrays, or an N-dimensional and an N+1-dimensional array.
0: https://www.postgresql.org/docs/current/arrays.html#ARRAYS-M...
You’re right that managing lists in RDMSes is easy-ish, if you don’t have too many of them, and they’re not too large. But, like I mentioned in my original comment, redis really shines as a complex data structure server. I wouldn’t want to implement my own cuckoo filter in Postgres!
As a cache or ephemeral store like a throttling/rate limiting, lookup tables, or perhaps even sessions store, it's great; but it's impossible to rely on the persistence options (RDB, AOF) for production data stores.
You usually only see this tendency with junior devs, though. It might be a case where "when all you have is a hammer, all you see are nails", or when someone discovers Redis (or during the MongoDB hype cycle ten years ago), which seems like it's in perfect alignment with their language datatypes, but perhaps this is mostly because junior devs don't have as many production-ready databases (from SQL like Postgresql, CockroachDB, Yugabyte to New/NoSQL like ScyllaDB, YDB, Aerospike) to fall back on.
Redis shines as a cache for small data values (probably switch to memcache for larger values, which is simpler key-value but generally 3 to 10 times faster for that more narrow use case, although keep an eye on memory fragmentation and slab allocation)
Just think carefully before storing long-term data in it. Maybe don't store your billing database in it :)
Sure, don't introduce a data store into your stack unless you need it. But if you had to introduce one, Redis still seems like one of the best to introduce? It has fantastic data structures (like sorted sets, hash maps), great performance, robust key expiry, low communication overhead, low runtime overhead... I mean, the list goes on.
https://redis.io/docs/latest/develop/data-types/probabilisti...
Redis as transactional, distributed and/or durable storage is pretty poor. Their "active active" docs on conflict resolution for example don't fill me with confidence given there is no formalism, just vague examples. But this comes from people who not only not know how do distributed locks, they refuse to learn when issues are pointed out to them: https://martin.kleppmann.com/2016/02/08/how-to-do-distribute...
Every time I find code that claims to do something transactional in Redis which is critical for correctness, not just latency optimization, I get worried.
I thought Redis was single threaded running on a single core.
Having multiple cores provides no benefit (and arguably could hurt since large multicore systems typically have a lower clock)
And one other thing is you should be able to fall back if your cache is invalidated.
In our case we keep a bunch of metadata in redis that’s relatively expensive (in cost and speed) to pull/walk in realtime. And it needs to be fast and support lots of clients. The latter sort of obviates direct-to-database options.
I'm assuming based on the rest of the article that the author and team knew what they were doing, but if you aren't familiar with Postgres' UPDATE strategy and HOT updates [0], you should familiarize yourself before attempting this, otherwise, you're going to generate a massive amount of WAL traffic and dead tuples at scale.
[0]: https://www.postgresql.org/docs/current/storage-hot.html
I think people forget (or don’t know) that adding data storage system to your architecture also involves management, scaling, retention, and backup. It’s not free.
And second, sometimes you do need to invest in storage to permit failover or to minimize latency but people do it for traffic when they really have little traffic. A server from 20 years ago could push a lot of traffic and systems have gotten only beefier.
BAAAHHH HAAHA HA HA HA HHHAA. Ok, look. Just because it's in Redis does not disqualify the clause in the geo service's license that you NOT permanently store the data. The author did not say that's what they were doing but a very similar thing came up in a previous work place for me and we chose to call the service for every lookup the way the service company expected their customers to do. I recall the suggestion in using a caching database as a workaround and it was not made by the engineering team. Sorry, I'm still chuckling at this one...
Then I decided to give up and use it only as an empehemral cache. I have a large number of standalone Redis instances (actually, now they are Valkey), no storage, only memory, and have Envoy proxy on top of them for monitoring and sharding. And I'm really happy with it, storing hundreds of GBs of data there, if one goes down, only a small part of the data needs to be reloaded from the primary source, and with the Envoy proxy, applications see it as a single Redis server. I was considering just replacing it with memcached, but the redis data model is more rich, so I kept using it, just not expecting anything put there to be actually stored forever.
Don't kill me :')
And you're almost certainly still caching it to RAM, not sure if you're aware that the kernel keeps in-memory copies of recently-read/written pieces of files
It's easy to work with and does it's job so well that for a certain type of dev it's easier to implement caching than understand enough SQL to add proper indices.
You can use Postgres instead of them, but then it can easily become a bottleneck.
REDIS tradeoffs have been so widely discussed because many, many engineers disagree with them. REDIS is so lowly regarded that some companies ban it completely, making their engineers choose between memcached or something more enterprisey (hazelcast these days, Coherence some time ago).
The source code is magnificent.
Why are you writing Redis in all caps like it's an acronym? Reminds me of those old dog C programmers who write long rants about RUST with inaccuracies that belie the fact they've never actually used it.
Core Redis is a solid piece of technology, afaik
Distributed systems is hard, and what seems like a trivial choice about design constraints can quickly balloon in cost. =3
I have a dedicated server that's running an instance of redis on each of the 32 cores and it's been up for over a year now. Each core is doing about 30k QPS