Scaling WordPress to 112,000 Sites

Most WordPress installs are one site. Some are a handful. A big one might be a few dozen blogs under a single network and call it a day.

News.net is more than 112,000 sites on one WordPress platform. Suburb pages, towns, local government areas, regions, states, countries and continents, all interlinked, all fed from a central hub, all able to run on their own and in concert with everything above them. A story can land from a wire service and be classified, sized, syndicated and live across the relevant slice of those sites within seconds.

That scale breaks almost every assumption WordPress is built around. This is the engineering that made it hold.

People love to say WordPress can’t scale. What they usually mean is that the default configuration can’t scale, which is true and uninteresting. The platform underneath is a capable PHP runtime. We’ve made that argument in the abstract before. News.net is the version with receipts.

Why it runs on WordPress at all

Nobody sat down and decided to model a global news network on WordPress. It started as a WordPress prototype, the prototype worked, and it kept working as the network grew. It scaled far better than anyone expected, and at no point did ripping out the foundation become a better idea than improving it. The reasons to stay had little to do with scaling and a lot to do with the boring practicalities that sink most platforms.

Start with the people who use it all day. WordPress comes with an editing and publishing experience that journalists and editors already know. There was no custom CMS to learn, no training budget for an admin nobody had seen before, and people who write for a living could get on with writing. That sounds minor until you’ve watched a platform fail because the people stuck using it every day quietly came to hate it.

It also keeps hiring sane. There’s a deep pool of developers who know WordPress and PHP, so a team can be built and grown from common skills instead of the handful of people who happen to understand one company’s private platform. A bespoke system is only ever as maintainable as your ability to keep staffing it.

And the ecosystem is there when you want it, even though News.net barely touches it. The platform leans on very few plugins, and almost all of the heavy lifting is custom code written for this job. Choosing WordPress keeps the door to that ecosystem open anyway, so new functionality can be reached for from a mature body of existing work rather than built from nothing every time the product grows.

One platform, 112,000 fronts

The whole network is a single WordPress multisite install on a subdomain layout. Every locality is its own blog with its own front page, its own editorial pins and its own audience, but it shares the codebase, the media library and the content pool with everything else.

The sites are not a flat list. They sit in a strict hierarchy:

world → continent → country → state → region → lga → local

Each site carries one flag that says what level it is, plus a pointer to its parent. A Kearneys Spring local site points up to Toowoomba, which points up to a region, then Queensland, then Australia, then the world hub at the root. That chain is the spine of the entire product. It decides what news a site shows, which parent it inherits from when it has no story of its own, and how content rolls up and down the network.

You don’t build 112,000 sites by hand in wp-admin. We generated them from source data, looping over locality spreadsheets and calling WordPress site-creation directly, setting each site’s level flags and parent pointer as it was created. Creating the site is the easy half. Modelling the relationships so the network behaves like a coherent whole, rather than 112,000 unrelated blogs that happen to share a database, was the actual work.

The database is the first thing to break

At this scale, WordPress’s own tables turn into the problem. wp_postmeta and wp_options grow without any natural ceiling, every site multiplies the row count, and a query pattern that’s fine on one blog becomes a tablescan that takes the whole site down.

The first fix is to stop sending every read to the same database. We run HyperDB in front of a managed database cluster and split reads from writes. One endpoint takes all the writes. A pool of read replicas takes the front-end read traffic, which is the overwhelming majority of it.

// Writer: all writes, plus reads only inside wp-admin as a fallback
$wpdb->add_database([
    'host'    => "{$writer_host}:{$port}",
    'write'   => 1,
    'read'    => 2,            // lower priority than the dedicated reader
    'dataset' => 'global',
    'timeout' => 0.2,
]);

// Reader: front-end reads land here first
$wpdb->add_database([
    'host'    => "{$reader_host}:{$port}",
    'write'   => 0,
    'read'    => 1,
    'dataset' => 'global',
    'timeout' => 0.2,
]);

Spreading reads across a pool of replicas rather than one fixed server matters more than it looks. The database can add and remove replicas under load, and connections fan out across whatever is currently healthy. The admin panel still reads from the writer, so editors get a consistent view even while the public side is served entirely off replicas.

There is a sharper database lesson buried in the locality data. Early on, the suburb, region and area taxonomies were modelled as hierarchical WordPress taxonomies, because hierarchical taxonomies give editors a nice nested checklist. With tens of thousands of locality terms, that choice quietly becomes a trap. Saving a post starts timing out, term queries crawl, and the cause is the way WordPress stores hierarchy for taxonomies. The fix was to make the large taxonomies non-hierarchical and rebuild the parent and child relationships ourselves through custom fields. That costs editors the tidy nested checkbox UI they liked, which at this volume is a trade you make the moment saving a post starts timing out.

wp-admin nearly killed us before the front end did

Here is the failure nobody predicts. The public site was fast and the admin was close to unusable. Network screens were taking around twenty seconds to load on a healthy server with a calm database. No slow query alarms, plenty of idle PHP workers, CPU fine. Everything that should have explained it looked normal.

The cost was hiding in the admin bar. WordPress rebuilds the admin bar on every wp-admin request, and as part of that it calls get_blogs_of_user() to populate the My Sites menu. On a normal install that function is trivial. On a network with this many sites it has to work out which of them the current user belongs to, and that turns into a very expensive scan run on every single admin page view. Hiding the My Sites menu does nothing, because the admin bar still initialises and still loads the data before deciding not to show it.

WordPress gives you an escape hatch for exactly this, a filter that short-circuits the function if you return a non-null value:

// Admin only: skip the full network scan, return just the current site
add_filter('pre_get_blogs_of_user', function ($sites, $user_id, $all) {
    if (!is_admin() || wp_doing_ajax() || empty($user_id)) {
        return $sites;
    }

    $blog_id = get_current_blog_id();
    return [$blog_id => (object) ['userblog_id' => $blog_id]];
}, 10, 3);

That one change took admin screens from roughly twenty seconds to a couple of seconds, with no new servers and no schema change. The lesson has stuck with us on every large system since: when something is slow at scale, the bottleneck is almost never where the dashboard is pointing. You find it by turning on the slow request log and reading the stack traces of the requests that actually hurt, not by guessing.

A media library measured in millions

The shared content model creates a media problem most builds never meet. The library is well over a million high-resolution items and grows by hundreds a day, and a single image might need to appear on a suburb page, a country edition and a syndication partner, each wanting a different size and format.

Two decisions carry this.

First, every site reads and writes from one shared uploads directory rather than the per-site folders multisite gives you by default. That’s a one-filter change to where WordPress thinks uploads live. It also comes with a nasty footgun: because WordPress sees the shared path as a custom upload location, deleting a single site will happily delete the shared media for all of them. If you do this, you wire in guards so a routine site deletion can’t wipe the library. We learned that the careful way rather than the expensive way.

Second, images are not pre-rendered into every possible size up front. A lightweight endpoint resizes on demand, validates that the source is one of our own domains, caches the result on disk under a hash of the URL and dimensions, and serves it with a one-year cache header. The first request for a given crop pays the cost. Every request after that is a static file. With originals and derivatives in cloud storage, a million-item library stops being a storage and CPU fire and becomes mostly cache hits.

Ingestion, classification and the pinning queue

Content comes in from wire services like the Associated Press, through a dedicated ingestion pipeline rather than someone pasting articles into an editor. PHP feed parsers and Node.js fetchers pull stories in, and each one is classified into the locality hierarchy as it arrives, from continent down to the specific area, in seconds. The root site is the hub: it ingests everything once, and the network draws from it.

The part that makes the network feel local is the pinning queue. Editors pin stories into ordered slots at any geographic level. When a given site renders its front page, it doesn’t just show its own pins. It walks up the hierarchy. If the local site has nothing for a slot, it takes the region’s pick, then the state’s, then the country’s, then the world hub’s, with local beating regional, regional beating national, national beating world. Duplicates are stripped, the final list is fetched in a single batched query across the correct sites, and the whole result is cached for a few minutes per site and per queue.

That’s how one newsroom action can populate thousands of localised front pages without an editor ever touching most of them, while a local editor who wants control still gets the final say on their own page. We built the editorial tooling for it as proper applications loaded inside wp-admin rather than as a pile of meta boxes, which is the same treat-WordPress-as-a-real-runtime discipline applied to the editor experience.

A wire story is never finished

A wire story is not a fixed thing you import once. The Associated Press sends the same story over and over, each time as a new version, and a fast-moving story can go through a dozen of them before it settles. The first version of a breaking story is often a single line with no image at all. The detail fills in over the next few versions, and the photograph might not arrive until version three or four.

So ingestion can’t just mean insert an article. It has to mean reconcile this version against the one we already hold. Every incoming item carries a stable identifier, a version number and a content hash. We match on the identifier, skip anything that isn’t actually newer, and when a real update lands we write it through to the same post: same ID, same URL, same publish date and position in the network, with the body and any new media refreshed underneath. The link a reader shared two minutes ago still works. The story behind it just got better.

The harder half is that the wire is not the only thing editing these stories. News.net editors rewrite headlines, correct the locality, promote stories and shape them for their audience, on the same articles the wire keeps updating. A naive importer would wipe an editor’s work the next time a correction came down the wire. So an update has to know when a person is mid-edit or has already shaped a published story, and hold off rather than overwrite. The rule that emerged is easy to say and fiddly to enforce: the wire owns the facts, the editor owns the presentation, and an automated update is never allowed to stomp a human.

Infrastructure that flexes with the news

News traffic is spiky by nature. A normal afternoon and a major breaking story are different platforms wearing the same logo, and the gap between them can be an order of magnitude with no warning.

The platform runs on AWS and adds or removes servers automatically as that traffic rises and falls, so the network is not paying for a breaking-news fleet on a slow Tuesday.

Caching news is harder than caching most things, because the most valuable page is the one that won’t sit still. A story from last week never changes, so it can be cached hard and served almost for free. A breaking story changes every few minutes, and that’s exactly when it’s pulling the most traffic it will ever see. Cache it too aggressively and readers get a stale version of the biggest story of the day. Don’t cache it and every refresh lands straight on the database at the worst possible moment. We lean on Redis with targeted invalidation: cache aggressively by default, and the moment a newer version of a story writes through, drop precisely the keys that story touches. The next reader pays for one fresh render and everyone behind them is served from cache again.

Ingestion runs on its own separate environment from the public site, so the constant churn of pulling and classifying wire content never competes with serving readers. It’s the same codebase running in two roles, which means a deploy to one never touches the other and a parser problem stays on the parser.

What it actually proves

None of this is exotic infrastructure. It’s AWS, a managed database, a caching layer and a long-lived open-source plugin, assembled by people who read the stack traces and modelled the domain properly. The cleverness is in the data model and in knowing which default to throw away before it bites: hierarchical taxonomies, per-site uploads, the admin bar’s site scan, eager image sizing. Every one of those is fine until it’s the thing taking the site down.

The same thinking applies to anyone weighing a rebuild. When a platform hits a wall, the reflex is often to declare the tool dead and start again on something newer. Most of the time the tool isn’t the wall. The wall is a default nobody questioned and a data model that was never built for the scale it grew into. We make the buyer’s-side version of this case regularly: a rewrite is a very expensive way to avoid an afternoon with a profiler.

If you have a system that has outgrown how it was first built, whether that’s WordPress, a database under strain, or a workflow that buckles every time volume spikes, that’s the kind of problem we like. Show us where it hurts and we’ll tell you honestly whether it needs re-engineering or replacing.

Case Study: How we scaled WordPress to 112,000 sites