An Old Trick That May Have Prevented the PA NG911 Outage (A little click-baity – but FACT)

For an AUDIO copy of this Blog – Click Here

When a major NG911 system, such as Pennsylvania’s, goes down, it sends shockwaves through the public safety industry—and not just locally. The root cause of this recent outage? A “defective operating system.” Initially, officials said there was no maintenance happening at the time—but a follow-up report in PennLive revealed something different.

It turns out that the NG911 vendor had already encountered similar issues at another customer site and issued a patch to address them. That fix worked fine for the first customer, so they pushed it out to other environments, including Pennsylvania’s system. But here’s the catch: not all systems behave the same, even if they’re running identical code. Welcome to the quirky world of IP-based platforms—each one has its own “personality.”

Unfortunately, this update resulted in an unexpected degradation of Pennsylvania’s NG911 service. Should it have happened? No. Could it have been prevented? Absolutely.

The Best Practice: Split-Core Upgrades

The solution is straightforward and time-tested: use split-core upgrade procedures.

Many modern critical systems—including NG911 platforms—are designed with dual-core redundancy. That means two independent processors (or “cores”) are running in tandem. They can operate in:

  • Active-Active Mode: Both cores are live and share the processing load.
  • Active-Standby Mode: One handles all the traffic, while the other is idle but ready to instantly take over.

With split-core upgrades, here’s what you do:

  1. Split the cores so that only one handles live traffic. The other is taken “offline” for maintenance.
  2. Apply upgrades, patches, or significant changes to the offline core. Since it’s not in use, you can reboot it, tweak it, or even break it and fix it—without affecting live services.
  3. Run a full test suite on the upgraded core. Think of it like your system’s second dress rehearsal before opening night.
  4. Once it passes all tests, flip traffic from the original core to the newly upgraded one.
  5. Closely monitor performance. If anything’s off, flip traffic back instantly. No dropped calls, no public impact.

If the upgraded core runs smoothly, you can then update the second core using the same procedure. Ultimately, both cores are fully upgraded, tested, and ready to resume either load-sharing or failover responsibilities.

And here’s the beauty of split-mode: You can stay in it for hours or days if needed. Many industries—such as finance, airlines, and hospitals—do this all the time to prevent downtime in 24/7 operations. So why isn’t Public Safety doing the same?


A Real-World Example

Between 1998 and 2003, I helped engineer emergency communications and global voice networks for a major financial institution. During that time, upgrades were nearly constant—especially after major mergers. We often worked through entire weekends, swapping hardware, rolling back broken patches, and reworking configs—without ever dropping a single call.

In one instance, our trader floor upgrade in Chicago had a hard deadline of four hours, but the process would typically take five. So we practiced. We defined a go/no-go point. We tested, failed, regressed, re-tested, and finally nailed it—because we built the plan around failure being part of the process.

No one outside that room knew how close we came to disaster. And that’s how it should be.


Final Thoughts

Split-core upgrades aren’t new. They’re standard operating procedures for critical systems where failure isn’t an option. So why isn’t Public Safety treated with the same level of rigor? Why is this capability not on the MUST HAVE requirement list?

Maybe it’s time for an official best-practices document. Until then—feel free to share this blog. It’s not magic, it’s just good engineering. And I guarantee anyone who’s worked on the service or support side of NG911 would agree: split-core upgrades work.

If you find my blogs informative, I invite you to follow me on X @Fletch911. You can also follow my profiles on LinkedIN and Facebook and catch up on all my blogs at https://Fletch.tv. AND BE SURE TO CHECK OUT MY LATEST PROJECT TiPS: Today on Public Safety @ http://911TiPS.com

Thanks for spending time with me; I look forward to next time. Stay safe and take care.

Follow me on Twitter/X @Fletch911
See my profiles on LinkedIN and Facebook
Check out my Blogs on: Fletch and http://911TiPS.com


Leave a Reply