California's NG911: Design Flaws or Oversight Issues?

For an AUDIO Copy of the Blog, See the Video Below:

California spent serious money building Next Generation 9-1-1, and then hit the kind of “it works in the lab” problem that makes every call taker’s eye twitch. The hot takes are flying, the word “failure” is being used like confetti, and I’m here to say something mildly controversial: the design was not the dumb part. The dumb part was thinking four different implementations would magically behave like one system without a grown-up enforcing the rules.

Today we’ll be talking about California’s Next Generation 9-1-1 (NG911) program, and whether what went wrong was a failure of design, or a failure of oversight and implementation.

First, let’s define “failure” like adults

In Public Safety, “failure” is not a tweet. It’s not a headline. It’s when a caller cannot reach help, or when help is delayed because the system did something unpredictable at exactly the wrong time.

California’s NG911 rollout has been publicly associated with serious operational issues in early deployments, including reports of dropped or misrouted calls, and dispatchers feeling like “test subjects.” That is not a vibes problem. That is a risk problem.

Now, it is also true that California invested north of $450 million to build its NG911 system over multiple years, and then paused and pivoted away from the regional model. That is the kind of sentence that makes taxpayers and call takers do the same thing at the same time: inhale sharply.

But here is the key point: a program can fail operationally even when the architecture is sound. That’s not a cop-out. That’s real life.

The money and the model: what California actually built

Let’s anchor this in facts, not hot takes.

California’s published transition materials describe a phased program (planning, infrastructure deployment, then PSAP migration) and note that the phased migration was halted after 23 PSAP transitions due to significant operational disruptions.

Independent reporting and Cal OES statements also consistently point to the same rough scale: more than $450 million spent and a regional approach that split the state into multiple sectors, supported by multiple providers.

And, crucial detail that most pundits skip because it’s not spicy enough: the National 911 Program’s “Lessons Learned” document describes California’s model as multi-regional, with three different NGCS cores and four different NGCS providers in an interoperable model, including an approach where a statewide provider was intended to coordinate interoperability through an interface control document aligned with NENA i3.

That matters, because it tells us the intent was not chaos. The intent was resiliency and interoperability.

“Was the design bad?” No. It was normal networking

I do not agree with the pundits who claim the regionalized architecture was inherently a poor design. In fact, at a high level, it aligns with how i3 thinking and standard network design works: build independent domains that can operate locally, interconnect them, and add redundancy above them.

If you’re in your 20s and new to this, here’s the simplest analogy I can offer:

Your phone does not rely on one cell tower.
Your streaming app does not rely on one server.
The internet does not rely on one router.

We build systems in “layers” so that when one part has a bad day, the entire state does not have a bad week.

A regionalized ESInet approach, with an “umbrella” layer above it for inter-region connectivity and backup NGCS capability, is conceptually similar to how large-scale networks are designed. It is also similar to how cloud environments are structured, with multiple zones and regions that can fail without ending the world.

There is no magic here. This is networking. It should be explainable without summoning a wizard, and it definitely should not require a rare skill set that only exists on a mountain top. California’s own lessons-learned documentation frames the approach as an interoperability opportunity, not an architectural impossibility.

Where it likely went sideways: interoperability without authority

Here is the part where I get gently snarky, but still respectful.
A multi-vendor model can work. It can even be a smart resiliency choice. But multi-vendor only works when “interoperable” means “verified,” not “promised.”

The National 911 Program lessons-learned report describes how Cal OES established that the statewide provider (PNSP) would be the authoritative source for defining an interface control document (ICD) for exchanging data, and that ICD was required to be based on NENA i3 standards.

That is exactly the right idea. The ICD is basically the rulebook that says:

“This is what a call looks like.”
“This is how location is represented.”
“This is how we transfer.”
“This is how we fail over.”
“This is what ‘compliant’ means in actual packets on the wire.”

Now the hard truth: a rulebook that is not enforced is just fan fiction.

If regional carriers are left to implement “their interpretation” of NG911, you get what every IT person on earth recognizes instantly: compatibility problems. The ports are “open,” the circuits are “up,” the dashboards are “green,” and yet the system still behaves like a shopping cart with one bad wheel.

That is not a design failure. That is a governance and implementation failure.

And California’s own transition summary points directly at the pain point: the four-service-provider regional model created interdependencies and delayed issue resolution across providers and PSAPs.

Translation into plain English: when something breaks, you get finger-pointing instead of fixes, because nobody has the authority to make the fix stick across the whole ecosystem.

“Scrap the network and start over” makes me twitch

I am a little shocked by the idea of scrapping the existing network and starting over, at least as a default posture.

Why? Because in most large systems, the hardware and circuits are not the “magic.” They are plumbing. Expensive plumbing, yes, but still plumbing. You do not remodel your entire house because the bathroom faucet drips. You replace the faucet. Or, if you’re me, you replace the faucet after staring at it for three hours and muttering words that would get bleeped on daytime TV.

California’s own stated baseline objectives for the next approach include utilizing as much of the previous project investment and technology as possible, which tells me even the state recognizes there is value in what already exists.

So if the new plan ends up reusing infrastructure, then the key question becomes:

What, exactly, are we throwing away, and why?

Because if the real issue was software behavior, interoperability enforcement, testing responsibility, and cutover process, then a full “rip and replace” might be the most expensive way to learn a lesson we already know.

The real “magic” is software, and software is portable

Let’s say the quiet part out loud: the magic is software.

Software is where routing logic lives. Policy lives there. Interop logic lives there. Monitoring, alarms, failover behavior, security rules, certificate handling, and the thousand tiny decisions that turn “a call” into “a working call” all live in software.

And software does not care if the server is named “Bob” or “Susan” or “CAL-ESINET-CORE-03.” It cares about configuration, standards adherence, and whether the people operating it have a controlled process.

That means you do not fix software problems by changing the brand of rack screws. You fix software problems by fixing requirements, implementation, integration, and test methodology.

California’s transition plan also recognizes operational burden during transition, noting that PSAPs had to handle both legacy and NG911 call flows, and that testing burden hit understaffed centers.

That is a giant red flag for any deployment: if the folks doing mission-critical operations are also forced to become your integration lab, you are borrowing reliability from tomorrow to pay for today.

Why does the whole state need to cut over at once? It doesn’t.

One of the strangest ideas in big-government technology projects is the fantasy of the “big bang cutover,” where everyone switches at once and it’s beautiful and nobody screams.

In reality, the best NG911 deployments are phased, controlled, and reversible.

It seems to me each region should be operationally autonomous. Each should have the ability to build, test, and go live on its own schedule when it is ready. The state “umbrella” network that ties regions together is not even required until you have at least two regions live, and even then its first real value is inter-region transfers and broader resilience.

For most PSAPs, day-to-day operations are local and regional. The umbrella matters for the edge cases, the border areas, and the true disaster scenarios. That is important, but it is not a prerequisite to start delivering real NG911 value inside a region.

California’s own planning narrative, and external reporting, indicate a pivot to a statewide primary provider plus a backup provider to reduce handoffs and interfaces. That is one way to reduce complexity, but it still does not require a single statewide “everyone go live Tuesday at 2 PM” moment.

Standards matter, and “ANSI approved” is not just a gold star sticker

Let’s talk standards in a way that does not induce a nap.

A standard is basically a shared agreement that says:

“When we both say ‘location,’ we mean the same structure, the same fields, the same behavior, and the same validation rules.”

NENA i3 is not perfect, but it is the closest thing the industry has to a functional blueprint that lets different systems behave like one. California’s own “Lessons Learned” documentation specifically references an interoperability approach tied to NENA i3 via an ICD requirement.

And yes, i3’s ANSI approval matters. To a non-technical person, it sounds like a plaque that gets mounted in a hallway. To the people who have spent years grinding through definitions, test cases, and interoperability arguments, it’s evidence that the work was formalized, reviewed, and made durable.

More importantly: standards let you hire normal IT talent.

You are not building a one-off science project. You are building infrastructure.

My “children on the playground” theory

Here is my working theory, stated simply:

California’s failure was not primarily architectural. It was programmatic.

The state essentially subcontracted the correlation of connectivity across multiple providers, but did not provide enough centralized authority to force consistent implementation across regions. When differences in interpretation appeared, interoperability suffered. When issues appeared, resolution slowed because of interdependencies.

California’s transition summary bluntly calls out those interdependencies as a key issue.

This is where my old-guy wisdom kicks in:

If nobody owns the system end-to-end, the system will behave like nobody owns it.

That is why you need explicit rules, explicit test gates, explicit cutover criteria, and real authority to demand compliance.

And if that authority does not exist in the next plan, then any future endeavor remains susceptible to the same category of failure, whether you use one vendor or four.

A timely note: oversight is now part of the public conversation

The fact that California’s NG911 effort has become a legislative and transparency issue tells you something important: this is no longer just a technical project. It is a governance story.

NBC Bay Area reports that a proposed “Fix 911 Act” (SB 985) would require quarterly reporting on spending, timelines, and progress, specifically in response to cost overruns, delays, and concerns about how money was spent.

I am not taking a political position here. I am taking a Public Safety position: when systems are mission-critical, oversight is not optional.

The bottom line, in plain language

If you’re new to Public Safety and you want the lesson you can carry into the next 20 years of your career, it’s this:

NG911 is not a mystery. It’s networking plus standards plus governance.

The regional architecture concept is defensible.
The infrastructure likely has reusable value.
The pain shows up when interoperability is not enforced, testing is mis-scoped, and cutovers are handled like a leap of faith instead of a controlled engineering process.

And if you’re a senior leader reading this, hoping there is still common sense left in the world: there is. The industry already knows how to do this. We just have to stop pretending that “vendor cooperation” is a substitute for “documented requirements with enforcement authority.”

If you find my blogs informative, I invite you to follow me on X @Fletch911. You can also follow my profiles on LinkedIN and Facebook and catch up on all my blogs at https://Fletch.tv. AND BE SURE TO CHECK OUT MY LATEST PROJECT TiPS: Today on Public Safety @ http://911TiPS.com

Thanks for spending time with me; I look forward to next time. Stay safe and take care.

Follow me on Twitter/X @Fletch911
See my profiles on LinkedIN and Facebook
Check out my Blogs on: Fletch and http://911TiPS.com

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Four Vendors Walk Into an ESInet: What Could POSSIBLY Go Wrong!?

Like this:

Leave a ReplyCancel reply

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from The World According to Fletch . . .