Why traffic spikes crash systems and how modern architectures prevent it
24/02/2026
5 min read
System Design
thundering-herd
ChaiCode
server
backend
Databases
Load Balancing
When CBSE 12th board result website announces that results will be out exactly at 10 AM on particular day. Before 10 AM, thousands of students are waiting and refreshing the page.
The moment the result goes live, all students send requests to the server at the same time. Instead of traffic increasing gradually, the system suddenly goes from almost no requests to thousands of requests in a split second. This sudden spike overwhelms the backend causing crashes.
This situation where large amount of clients simultaneously requests the resource after it becomes available called Thundering Herd Problem.
When a large number of processes or threads are waiting for the same event, and when that event happens they all wake up and attempt to acquire the same resource at the same time. This creates sudden spikes in load, leading to bottlenecks, high latency, wasted work, and system crashes.
After the processes wake up, they all demand the resource and a decision must be made as to which process can continue. After the decision is made the remaining processes are put back to sleep, only to wake up again to request access to the resource.
When a large number of users request the same data, the backend first checks the cache.
If the data is present, it is returned immediately without touching the database. However, when the cache entry expires (TTL ends) or the cache is empty (cold start), all concurrent requests observe a cache miss.
Instead of a single request rebuilding the cache, thousands of requests simultaneously fall through to the database and attempt to fetch the same data.
This sudden spike overwhelms the database causing slowdowns, timeouts, or complete system failure.
This behavior is known as a cache stampede.
Unexpected CPU spikes
The system jumps from handling mostly cache reads to processing thousands of expensive database queries at once.
Increased latency
Requests queue up behind overloaded thread pools and database connections, causing response times to grow rapidly.
Resource waste
The same data is recomputed or fetched thousands of times even though only one result is needed, wasting CPU, memory, and network bandwidth.
While you might wander after reading about till now that thundering herd is normal spike problem but its not correct, well they do result is sudden surge of traffic but fundamentally they are very different in terms of the pattern of their spike.
Normal Spike : Here the increase in traffic is often occurs at different time like breaking news , sales etc. It is often handled by autoscaling, load balancer.
Thundering Herd : A specific scenario where a massive number of requests hit a system simultaneously, often because a cache has expired or a resource has become available, causing all waiting processes to rush for it at once. Requires specific mitigation strategies to prevent it.
In 2010, Facebook experienced a multi-hour outage triggered by a configuration change to a persistent storage value. The new configuration was interpreted as invalid by cache clients across the infrastructure. Each client attempted to “repair” the invalid configuration by querying backend database clusters, generating hundreds of thousands of queries per second and causing widespread database failures.
Initial condition: Configuration value cached across all application servers
Trigger: Configuration updated to value that failed client-side validation
Amplification: Each server detected invalid value and attempted database repair query
Cascading failure: Database clusters overwhelmed by synchronized query load
Recovery impediment: Cache invalidation loops prevented normal service restoration
To prevent this we have few strategies that we can follow to avoid such problem
Request coalescing : It Prevents thundering herd by ensuring that when many identical requests arrive during a cache miss, only one request fetches the data while others wait for the result.
Mutex : This mechanism that allows only one process or thread to access a shared resource at a time, while others wait. This prevents duplicate work, race conditions, and data corruption. It is also called request coalescing lock.
State while invalidate : Serve the old (stale) cached data immediately while refreshing the data in the background. Once the refresh completes, the cache is updated for future requests.

Like this there are plenty more strategies to prevent the thundering herd Problem.
So to summarize what we learnt so far is that the thundering herd problem happens when many requests react to the same event at once and overwhelm a shared resource, often during cache expiry or service recovery. Techniques like request coalescing, mutex locks, and stale-while-revalidate help prevent duplicate work and keep systems stable under spikes.
Hope you learned something useful from this post :) feel free to share it with your friends!