4 Surprising Truths About Data Models That Will Change How You Think About Code

Introduction: More Than Just Storage

You're spinning up a new service. The first major decision hits your desk: which database to use? It feels like a choice between PostgreSQL and MongoDB, tables and JSON. But the real choice you're making is not about storage; it's about the very language you'll use to think about your problem.

Data models are not just about how software is written; they are, more importantly, about "how we think about the problem that we are solving." The way we structure our data sets the boundaries for what our applications can do and how we can reason about them. This concept echoes a famous philosophical observation:

The limits of my language mean the limits of my world. -Ludwig Wittgenstein, Tractatus Logico-Philosophicus (1922)

This article explores four counter-intuitive takeaways from the history and theory of data models. They challenge common assumptions and reveal that the choices we make about data are fundamental to the success of our software.

1. Your Shiny New NoSQL Database Might Be a Throwback to the 1960s

It's a surprising claim: the modern, flexible document databases that feel like the cutting edge of technology have a core structure that remarkably resembles the "hierarchical model." This model was famously used by IBM's Information Management System (IMS); first developed for the Apollo space program and commercially released in 1968.

The parallel is striking. Modern document databases represent data as a tree of nested records, often using JSON. IMS did the same. This model is highly effective for one-to-many relationships, which is why it feels so natural for self-contained, document-like data such as a user profile with multiple job positions and educational entries.

However, this historical model shared a critical limitation with its modern counterparts. It struggled with many-to-many relationships, forcing developers into an uncomfortable choice: either duplicate data across records or manually manage references between them. These are "very much like the problems that developers are running into with document databases today."

This reveals a timeless lesson in technology: many "new" problems are actually old problems re-emerging in a different context. As the source text cites, this is a classic case of "What Goes Around Comes Around."

2. The "Schemaless" Advantage Is a Misleading Myth

The term "schemaless" is one of the biggest selling points for document databases, but it's also highly misleading. The reality is that the application code that reads the data almost always assumes an implicit structure. While the database doesn't enforce a schema, your code certainly expects one.

A more accurate way to describe the difference is through two distinct approaches:

Schema-on-write: This is the traditional relational model. The schema is explicit, and the database ensures all data conforms to it before it is written.
Schema-on-read: This is the document model approach. The structure of the data is implicit and is only interpreted when the data is read by the application.

An excellent analogy from programming languages clarifies this distinction. Schema-on-write is like static (compile-time) type checking, where structure is enforced upfront. Schema-on-read is like dynamic (runtime) type checking, where structure is validated on the fly.

To make this tangible, imagine you need to migrate from a single name field to separate first_name and last_name fields.

With schema-on-read, you start writing new documents with the new fields and handle the difference in your application code, which might look like this:

if (user && user.name && !user.first_name) {
    // Document written before the change
    user.first_name = user.name.split(" ")[0];
}

With schema-on-write, you would perform a formal migration on the database itself:

ALTER TABLE users ADD COLUMN first_name text;
UPDATE users SET first_name = split_part(name, ' ', 1);

Schema-on-read offers incredible flexibility, especially for heterogeneous data. Schema-on-write, on the other hand, provides invaluable documentation and enforcement of a consistent structure.

3. SQL's Real Superpower Isn't Tables-It's Abstraction

The relational model has dominated for decades, but its longevity isn't just because of tables and rows. The truly revolutionary insight of the relational model was providing a declarative query language: SQL.

The difference between declarative and imperative approaches is fundamental:

An imperative approach tells the computer how to perform operations in a specific, step-by-step order. Older network databases forced developers to manually navigate what was essentially an "n-dimensional data space."
A declarative language like SQL simply specifies the pattern of the data you want, not the algorithm to retrieve it.

The benefit of this abstraction is immense. The database's query optimizer is responsible for figuring out the "how";choosing the most efficient execution plan, deciding which indexes to use, and determining the order of operations. This frees the developer from low-level implementation details and allows the database engine to be optimized without changing application queries.

This declarative power isn't unique to databases. In fact, you probably use it every day on the web. Styling an element with declarative CSS is concise and robust:

li.selected > p {
    background-color: blue;
}

Now, compare that to an imperative JavaScript function to achieve the same result. The code is not only longer and harder to understand, but it's also more brittle:

var liElements = document.getElementsByTagName("li");
for (var i = 0; i < liElements.length; i++) {
    if (liElements[i].className === "selected") {
        var children = liElements[i].childNodes;
        for (var j = 0; j < children.length; j++) {
            var child = children[j];
            if (child.nodeType === 1 && child.tagName === "P") {
                child.setAttribute("style", "background-color: blue");
            }
        }
    }
}

The declarative approach wins by abstracting away the complex "how" and letting you focus on the "what."

4. For Deeply Connected Data, You Need a Different Language

When the relationships in your data become the primary feature, when "anything is potentially related to everything", both relational and document models can become awkward. This is where graph models are the most natural fit, making them ideal for social networks, web graphs, and other highly interconnected datasets.

While you can model graph data in a relational database, queries that traverse a variable number of relationships become extremely cumbersome. For example, finding a location "within" a country is simple for a human but tricky for SQL. In the US, the path might be city -> state -> country, a two-step join. In France, it might be department -> region -> country, a three-step join. The number of joins isn't fixed, which is a problem for standard SQL.

The difference is stark when you compare query languages. Consider a query to find all people who were born in the US and now live in Europe.

A graph query language like Cypher can express this variable-depth search concisely.
The equivalent SQL query requires recursive common table expressions (WITH RECURSIVE).

The result? The query can be written in just 4 lines of Cypher versus 29 lines of "very clumsy" SQL. This isn't a failure of SQL, but a powerful illustration that different data models are designed for different use cases. The most effective choice is the model that best fits the shape of your problem.

Conclusion: A Converging Future

These insights reveal that our data models are not just containers; they are frameworks for thought that shape our solutions. We've seen that history often repeats itself, that the true value of a tool like SQL is its power of abstraction, and that choosing the right model for the shape of your data is critical.

Looking ahead, the lines between these models are blurring. Relational databases are adding robust support for JSON, and document databases are improving their capabilities for joins and relationships. This trend of convergence suggests that "a hybrid of the relational and document models is a good route for databases to take in the future."

As our tools evolve, will the next great software innovation come not from a new model, but from how we creatively combine the old ones?

4 Surprising Truths About Data Models That Will Change How You Think About Code

Introduction: More Than Just Storage

1. Your Shiny New NoSQL Database Might Be a Throwback to the 1960s

2. The "Schemaless" Advantage Is a Misleading Myth

3. SQL's Real Superpower Isn't Tables-It's Abstraction

4. For Deeply Connected Data, You Need a Different Language

Conclusion: A Converging Future

Comments

More from this blog

I Was Doing AI Engineering Wrong; Chip Huyen’s Book Set Me Straight

If You Think Distributed Systems Are Deterministic, Read This

Senior Engineers Don’t Trust ACID: 6 Hard Truths About Database Transactions

How to Crash Your Production Database by Adding a Single Node 101

Your Data Isn’t Stored. It’s Negotiated.

Command Palette

Introduction: More Than Just Storage

1. Your Shiny New NoSQL Database Might Be a Throwback to the 1960s

2. The "Schemaless" Advantage Is a Misleading Myth

3. SQL's Real Superpower Isn't Tables-It's Abstraction

4. For Deeply Connected Data, You Need a Different Language

Conclusion: A Converging Future

Comments

More from this blog