NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
B-Trees: More Than I Thought I'd Want to Know (2021) (benjamincongdon.me)
sgloutnikov 189 days ago [-]
Great article! Here's one more, including animations on B-trees: https://planetscale.com/blog/btrees-and-database-indexes
GamerAlias 189 days ago [-]
Personally this was more useful than the original article. Original article mostly aimed at someone who is slightly more familiar with BTrees which is fine
dxdm 189 days ago [-]
I don't know, this article is immediately confusing, because the rules seem wrong, or inconsistent at least. That makes it harder to understand what's going on, unless I suppose one knows already.

The first rule does not allow trees containing less than 2 elements.

The second rule makes me wonder how "N" is being used in the rules. The first rule treats it as a per-node variable, per the second rule it has to be a variable external to the node. Also, what if N is not divisible by 2?

Now that I don't trust the rules anymore, I cannot tell what the third rule is trying to do or if it's even correct, because so far it has not been stated what purpose these rules are trying to accomplish.

Rule 4 contradicts rule 1.

Now, the ordering rules (and the demo) do not allow multiple elements with the same key. Is this on purpose? Database indexes support this, so it would be nice to get one sentence about, so I don't have to wonder why this general introduction does not seem to deal with it.

I'm only this far in, and it already threw multiple wrenches into my attempts at thinking along. Now I can try to fill the gaps in the explanation myself, but I'm wondering how much I can trust the interactive elements to test my own understanding.

This article has a lot of potential, but it could really use an editing pass or two.

hiimkeks 189 days ago [-]
> Slotted pages are composed of three components: a header (containing metadata about the page), cells (variable-sized “slots” for data to be stored in), and offset pointers (an array of pointers to those cells). The benefit of this layout is that you can store variable sized data, since the cells can be of variable size, and you don’t need to move that data to logically reorder it. Reordering the positions of the pointers in the pointer array is sufficient. This is inexpensive since the pointers are small, and in a well-known position at the beginning of the page. In other words, as long as the offset pointers are ordered in key-sorted order, it doesn’t matter where in the actual page the keys are stored.

Does that really make a difference if we have to re-write the entire tree node anyway, even if it's just for reordering the pointers (because node == page)?

tomnipotent 189 days ago [-]
> Does that really make a difference if we have to re-write the entire tree node

The I/O write is a fixed cost, but sorting the data to match the pointers would require additional CPU cycles for an outcome that wouldn't really make a difference for subsequent operations.

QuadrupleA 189 days ago [-]
I made a SQLite disk-page explorer a little while ago, helped me understand B-trees a lot better:

https://github.com/QuadrupleA/sqlite-page-explorer

nayuki 188 days ago [-]
I understood B-trees for their own sake since 2017 ( https://www.nayuki.io/page/btree-set ), not motivated by SQLite.

But relevant to what you shared, I wrote some tools in 2023 to parse SQLite files in TypeScript, right in the web browser: https://www.nayuki.io/page/sqlite-database-file-visualizatio...

prpl 189 days ago [-]
Missing b-link trees/concurrency/locking, but maybe that really is more than you want to know
iambvk 189 days ago [-]
I am working on implement on-disk B+Tree for the last few months. Man, keeping the on-disk state consistent with proper locking is a real challenge -- specially when we want to avoid IO-holding-mutex. And forward/backward iterators make me doubt the correctness.

All this with just fixed size keys/values. I am yet to start on variable sized keys and values, but I already want to give up on my initial performance targets.

brcmthrowaway 189 days ago [-]
Resource for that?
apavlo 188 days ago [-]
kuharich 189 days ago [-]
qianli_cs 189 days ago [-]
Great article — it clearly explains “The devil is in the details” :) Would love to see another one for LSM-Tree, and the comparison between B-Trees and LSM-Trees.
shayansm1 189 days ago [-]
a good source for implementing b-tree using golang: https://build-your-own.org/database/04_btree_code_1
dibyendu 188 days ago [-]
I implemented a concurrent recoverable B-link Tree some years ago based on research publication by Ibrahim Jaluta. Some details are here:

https://simpledbm.readthedocs.io/en/latest/developerguide.ht...

Implementation is here: https://github.com/dibyendumajumdar/simpledbm

samlightfoot 189 days ago [-]
A big fan of this article. I was in a similar position to the author in having a 'vague understanding'. Excellent for anyone wanting to solidify their mental model.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 06:55:00 GMT+0000 (UTC) with Wasmer Edge.