System Design: Scaling Databases to Millions – Complete Journey

02/03/2026

7 min read

System Design

Databases

ChaiCode

For any fast-growing system, one of the major problems it needs to solve is scaling. The system must be capable of handling lots of concurrent users. A crash is not an option. That can lead to major economic losses and reputation damage. So how do we prevent it?

Let’s design a basic system.

Designing a Scalable Application Layer

We’ll start with a load balancer that distributes traffic across four servers.

Stateful vs Stateless Approach

Now we have an option: we can make our servers stateful. Sounds simple, right?

But here’s the problem. Suppose some user’s session data is stored on Server 1. The next time that same user visits the page, how will the load balancer know that their data is on Server 1? Without any special logic, the load balancer could send them to Server 3. That’s a problem.

One solution is to use sticky sessions. The load balancer can apply rules — for example, round-robin with session affinity — so that the same user keeps getting routed to the same server.

Sticky sessions don’t actually store your data inside the load balancer. That’s a common misunderstanding. The load balancer isn’t holding your session state. It simply remembers that “User A” should always be routed to “Server 1.” It binds the client to a specific server.

Does that solve it?

Not entirely.

What happens if that specific server crashes? All the session data tied to it is gone. Now your stateful approach starts breaking down.

So maybe making the application layer stateful isn’t the best idea.

Instead, we keep our servers stateless.

But then the big question: where does the data go?

For that, we introduce a separate machine dedicated to storing data called database.

Now we have a single database instance. All backend servers are stateless and fetch data from this database. That means every read and every write goes to this one database.

But dont you think if all reads and writes are on single database it will overwhelms the server and make it slow ? Yes. That becomes a bottleneck.

So to prevent this bottleneck what can we do ?

Vertical Scaling

We can implement Vertical Scaling by adding more resources (means adding more CPU, more RAM, better SSDs) to it .

Or can we add another database server to handle the load will it work ? the answer is No because that would lead to the problem of Data Inconsistency.

So vertical scaling sounds good to prevent this bottleneck it will now work fast and we will keep scaling up our database as our business grows (as the number of reads/writes will be more now)

But there are trade-offs:

Cost - high-end machines are expensive.
Downtime - as there will more traffic and as we scale up there is some delay while adding more resources , during that period there will be business losses. We can prevent it out by preplanning means we can prewarm our server in advance before any traffic surge but what if the timing isn't right :(

So to conclude vertical scaling is good solution but there is some tradeoffs as well also other than this bottleneck which we tried to prevent one of the major problem that exist even after preventing this bottleneck is the Single Point of Failure if the single instance is down everything crashes.

So what we do now ?

Horizontal Scaling - DB Replication

Instead of making one machine stronger, we add more machines.

We introduce replication.

We will add multiple instances of replica(clone) where our primary database will be the master and others will be the replicas which will have the sole purpose of read so all the read will be done by the replicas called read replicas and all writes will be on the master database , this architecture is called primary replica or master slave architecture.

This way we will reduce the load on primary and we will periodically sync the master wth the replicas to keep it sync with primary (there will be some delay while syncing) so this way we have created distributed system to handle reads and writes.

Trade-offs :

Cost increases (more machines).
Storage consumption increases.
Data is not always real-time on replicas due to replication lag.

To solve the data Realtime problem what we can do is let the application decide which type of data needs to Realtime that part we can managed by reading with primary so its Realtime

For example : Instagram views can be read through the read replica as delay tradeoff we can do doesn't matter but an update to caption cannot be lagged so there we use the primary to read in Realtime.

Also, if the primary goes down, one of the replicas can be promoted to primary.

Replication helps when reads are the bottleneck.

But if write traffic becomes very high, replication alone is not enough because all writes still go to one primary :(

So now we will implement an even better technique.

Sharding

Sharding is a technique where the database is splited into multiple instances of independent database instances called shards.

So what we do here we split data across the shards which will unique to each server (subset of data) and we will do an indexing to it.

To decide which shard stores which data, we use a shard key.

Sharding Key

A sharding key is a field or combination of fields (indexed) in a database table that determines the distribution of data across multiple shards

Now to determine which data is stored in which shard we will implement a hash function

Hash Function

Its an algorithm where it will take the input of userId and it will return the index of the shard where the data is present.

let userId = 1;
let numOfServers = 3;

function getShardIndex(userId){
   return userId % numOfServers // hash value
}

const index = getShardIndex(userId);
write(db[index])
read(db(index])

this way now our data will be consistent and if you think the read / write operation will overwhelm any shard we can create read replicas of that particular shard this way we did horizontal scaling on this level too. so this is now more optimized

Now have you observe any problem here ?

Trade-offs :

Resharding : If we add a new server numOfServers changes from 3 to 4.Now the hash calculation changes. Data that was on shard 1 may now belong to shard 2. To prevent this we have to do data movement which is quite an expensive overhead and there is risk of data loss as well to resolve this issue we have consistent hashing which i will cover in another blog.
Expensive Cross-Shard Joins : If a query needs data from multiple shards, the application must fetch from each shard and merge results. Joins become expensive and complex.
Celebrity problem : This is also known as the Hotspot Key Problem. Imagine data for Virat Kohli , talwiinder and karan aujla all end up on the same shard. So on Instagram, that database shard would be overwhelmed with the read operations. To solve this problem, we may need to allocate a shard for each celebrity. Each shard might even require further partition.

but overall this is great architecture, some companies accept this trade-off.

Every system design has trade-offs.

Summary

Scaling a database is not about jumping to complex architectures from day one. It’s about evolving with your bottlenecks.

We start with a single database.
When it becomes slow, we try vertical scaling.
When reads increase, we introduce replication.
When writes become the bottleneck, we move to sharding.

Each solution solves one problem and introduces new trade-offs like cost, complexity, replication lag, resharding overhead, and hotspot issues.

There is no perfect architecture. There is only the right architecture for your current scale.

Good system design is not about overengineering early.
It’s about identifying bottlenecks, understanding trade-offs, and scaling deliberately.