NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
B-Trees: More Than I Thought I'd Want to Know (benjamincongdon.me)
sgloutnikov 4 days ago [-]
Great article! Here's one more, including animations on B-trees: https://planetscale.com/blog/btrees-and-database-indexes
GamerAlias 4 days ago [-]
Personally this was more useful than the original article. Original article mostly aimed at someone who is slightly more familiar with BTrees which is fine
dxdm 4 days ago [-]
I don't know, this article is immediately confusing, because the rules seem wrong, or inconsistent at least. That makes it harder to understand what's going on, unless I suppose one knows already.

The first rule does not allow trees containing less than 2 elements.

The second rule makes me wonder how "N" is being used in the rules. The first rule treats it as a per-node variable, per the second rule it has to be a variable external to the node. Also, what if N is not divisible by 2?

Now that I don't trust the rules anymore, I cannot tell what the third rule is trying to do or if it's even correct, because so far it has not been stated what purpose these rules are trying to accomplish.

Rule 4 contradicts rule 1.

Now, the ordering rules (and the demo) do not allow multiple elements with the same key. Is this on purpose? Database indexes support this, so it would be nice to get one sentence about, so I don't have to wonder why this general introduction does not seem to deal with it.

I'm only this far in, and it already threw multiple wrenches into my attempts at thinking along. Now I can try to fill the gaps in the explanation myself, but I'm wondering how much I can trust the interactive elements to test my own understanding.

This article has a lot of potential, but it could really use an editing pass or two.

hiimkeks 4 days ago [-]
> Slotted pages are composed of three components: a header (containing metadata about the page), cells (variable-sized “slots” for data to be stored in), and offset pointers (an array of pointers to those cells). The benefit of this layout is that you can store variable sized data, since the cells can be of variable size, and you don’t need to move that data to logically reorder it. Reordering the positions of the pointers in the pointer array is sufficient. This is inexpensive since the pointers are small, and in a well-known position at the beginning of the page. In other words, as long as the offset pointers are ordered in key-sorted order, it doesn’t matter where in the actual page the keys are stored.

Does that really make a difference if we have to re-write the entire tree node anyway, even if it's just for reordering the pointers (because node == page)?

tomnipotent 4 days ago [-]
> Does that really make a difference if we have to re-write the entire tree node

The I/O write is a fixed cost, but sorting the data to match the pointers would require additional CPU cycles for an outcome that wouldn't really make a difference for subsequent operations.

QuadrupleA 4 days ago [-]
I made a SQLite disk-page explorer a little while ago, helped me understand B-trees a lot better:

https://github.com/QuadrupleA/sqlite-page-explorer

nayuki 4 days ago [-]
I understood B-trees for their own sake since 2017 ( https://www.nayuki.io/page/btree-set ), not motivated by SQLite.

But relevant to what you shared, I wrote some tools in 2023 to parse SQLite files in TypeScript, right in the web browser: https://www.nayuki.io/page/sqlite-database-file-visualizatio...

prpl 5 days ago [-]
Missing b-link trees/concurrency/locking, but maybe that really is more than you want to know
iambvk 4 days ago [-]
I am working on implement on-disk B+Tree for the last few months. Man, keeping the on-disk state consistent with proper locking is a real challenge -- specially when we want to avoid IO-holding-mutex. And forward/backward iterators make me doubt the correctness.

All this with just fixed size keys/values. I am yet to start on variable sized keys and values, but I already want to give up on my initial performance targets.

brcmthrowaway 4 days ago [-]
Resource for that?
apavlo 4 days ago [-]
kuharich 5 days ago [-]
qianli_cs 5 days ago [-]
Great article — it clearly explains “The devil is in the details” :) Would love to see another one for LSM-Tree, and the comparison between B-Trees and LSM-Trees.
dibyendu 4 days ago [-]
I implemented a concurrent recoverable B-link Tree some years ago based on research publication by Ibrahim Jaluta. Some details are here:

https://simpledbm.readthedocs.io/en/latest/developerguide.ht...

Implementation is here: https://github.com/dibyendumajumdar/simpledbm

shayansm1 5 days ago [-]
a good source for implementing b-tree using golang: https://build-your-own.org/database/04_btree_code_1
samlightfoot 4 days ago [-]
A big fan of this article. I was in a similar position to the author in having a 'vague understanding'. Excellent for anyone wanting to solidify their mental model.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 20:31:10 GMT+0000 (UTC) with Wasmer Edge.