The perception gap is the primary diagnostic.
The single most useful number the Ops Check produces isn't any individual category score. It's the gap between two scores measuring the same operation.
The thesis
The single most useful number the Ops Health Check produces isn't any individual category score. It's the gap between two scores measuring the same operation.
A leader scores the operation. A senior person on the leader's team scores the same operation. The difference between those two scores — the perception gap — is the most reliable predictor of operational dysfunction we have surfaced in three years of running this instrument across mid-market operations.
This memo lays out why the gap is more diagnostic than the score, where the gap comes from structurally, what specific gap patterns mean, and what changed in the assessment when we made the gap the headline metric instead of a footnote.
Where the gap comes from
Every operating-system assessment ever built — whether ours, an EOS rollup, a Lean maturity model, or a private-equity-sponsor scorecard — has the same structural problem. The respondent rates what they perceive. What the respondent perceives is filtered by their proximity to the operation.
The CEO and the operations lead at the same company are rating two different things when they answer the same question. The CEO is rating the operation as designed — the intent, the architecture, the stated priorities. The operations lead is rating the operation as executed — what shows up in the inbox on Tuesday morning, what happens when a decision actually needs to be made, what the team experiences in the gap between intent and delivery.
Both are giving honest answers. Both are giving accurate answers to the question they think they're being asked. The question is just structurally different from each chair.
This is what we call the proximity discount. Leaders score what's intended; teams score what's lived. The gap between intent and reality is what an outside operating partner would walk into a portco and surface in the first 30 days. The two-layer assessment is the structured version of the same diagnostic.
The pattern that holds across operations
Across the assessments run since the two-layer model launched, the pattern is consistent enough to be a near-rule:
CEOs score the operation one to one-and-a-half points higher than their teams on most categories.
This is true at $5M operations. It is true at $200M operations. It is true in services, manufacturing, software, and industrial. It does not depend on the leader's personality. It does not depend on whether the team is afraid to be honest. It is structural.
The few exceptions reveal more than the rule. When the CEO scores lower than the team, one of three things is usually true: (a) the CEO is unusually self-critical and routinely under-reads operating health, (b) the team is deferring to leadership in their scoring out of cultural or political habit, or (c) the team has accumulated workarounds that work so reliably they've forgotten what's broken upstream.
When the gap is narrower than 0.5 points, the leadership and team are operating in genuinely shared reality — which is itself an operating accomplishment, since the default is the proximity discount.
When the gap is wider than 2.0 points on any single category, the leadership and team are operating in materially different versions of the same business. The two-layer report flags this as the most diagnostic finding in the assessment.
What different gap profiles tell you
The shape of the gap matters as much as the size. Four profiles cover most of what we see.
Profile 1 · Narrow across all categories (gap <0.5 on every category)
The team and leadership are aligned. This is the rarest profile — about 12% of assessments. Operations in this profile have usually done deliberate work to close the gap, typically through some combination of disciplined cadence, consistent decision-rights enforcement, and visible accountability. The risk in this profile is complacency: the alignment makes the operation easier to defend than to improve. The aligned view means the team is reading the same weaknesses leadership is — including the weaknesses leadership hasn't acted on.
Profile 2 · Medium gap on one or two categories, narrow elsewhere (gap 0.5–1.0 on 1–2 categories)
The team agrees with leadership on most of the operation. The categories where the gap widens are where intent and reality have drifted out of alignment. These are usually transmission problems rather than strategy problems — the leader knows what the category should look like; the team is experiencing something different. The fix is usually in the cadence: the gap closes when the category gets named explicitly in a recurring leadership forum.
Profile 3 · Wide gap on multiple categories (gap >1.0 on three or more categories)
This is the most common diagnostic profile (about 35% of two-layer assessments). The team and leadership are operating in materially different realities across most of the operating system. The cause is almost always one of three structural patterns:
- The operating system was built by someone else (predecessor leadership, founder who has stepped back) and never refactored to the current operator's style. The team's read is anchored to the prior system; leadership's read is anchored to the intended new system; the gap is the unresolved transition.
- The operation has scaled past leadership's direct visibility. The team is experiencing what's actually happening; leadership is experiencing what was true 18 months ago when the operation was small enough to feel in detail.
- The team has stopped surfacing problems because surfacing them hasn't produced response. The gap reflects the team's accumulated decision to operate quietly around issues that would otherwise create friction.
Profile 4 · Wide gap on a single category (gap >2.0 on one category, narrow elsewhere)
The most pointed diagnostic. A single-category gap of 2.0 or more means the team is experiencing a structural issue in this category that is invisible from leadership's seat. Three patterns produce this profile:
- Decision Rights gap of 2.0+ — the team experiences a bottleneck or fog field the org chart doesn't show.
- Accountability gap of 2.0+ — the team is watching standards get missed without consequence; leadership believes the response system is working.
- Data & Metrics gap of 2.0+ — the team is operating with metrics they don't trust; leadership is operating with the same metrics with confidence.
Each pattern has a specific install. The diagnostic value is in the category itself — the gap tells you where to look first.
What changed when we made the gap the headline
Through the first iteration of the Ops Health Check, the report led with the composite score and the per-category breakdown. The perception gap was a footnote — a small visual element near the bottom that operators frequently missed entirely.
Three patterns showed up reliably:
- Operators read the composite, recognized themselves in the per-category breakdown, and closed the report without engaging the gap data at all.
- When two operators from the same company independently took the assessment, the comparison was the most useful conversation the assessment ever produced — but the report didn't structure that conversation, so it happened ad hoc or not at all.
- The kits we recommended based on composite scores landed less well than the kits we recommended based on gap signal. Composite-driven recommendations tended to surface what the operator already knew was wrong. Gap-driven recommendations surfaced what the operator hadn't yet named.
We rebuilt the report so the perception gap renders as a top-tier visual after the archetype headline and the category bands. The gap is now the third thing an operator reads, not the seventeenth thing they don't.
The change moved the install-conversion rate on recommended kits by a measurable margin. More importantly, it changed the conversation the assessment provokes inside leadership teams from "what's our score" to "where do we and our team see the operation differently." The second conversation is the one that produces structural change.
How to read your own gap
If you've completed the assessment with both leadership and team respondents, the perception gap renders inline in your report. The categories with the widest gaps are ranked at the top. Each comes with a one-paragraph diagnosis specific to the gap shape.
If you've completed it solo, the most useful single next move is to have one senior leader on your team take the same assessment, separately, without seeing your scores. The two reports render side-by-side with the gap as the headline.
A practical test you can run without the formal instrument: pick three operating categories that matter most. Rate each, honestly, 1-10. Then ask three senior leaders on your team to do the same rating, separately, without seeing your scores. Compare. Where the gap is wider than 2 points, you have a perception problem that's almost certainly costing operating performance. Where the gap is wider than 3 points, you have an urgent issue that won't fix itself.
The gap is the data. The gap is the work.
What the gap is not
A few clarifications on what the perception gap doesn't mean, because operators sometimes draw the wrong conclusion:
The gap is not a measure of who is right. Both parties are giving accurate answers from different proximity to the operation. The work is closing the gap, not winning the argument about which score reflects "real" performance.
The gap is not necessarily a leadership flaw. Operations with strong leaders show the gap too — often more pronounced, because strong leaders score what they're working toward, not what's currently true. The gap is structural, not personal.
The gap is not always bad. A gap of 0.5 to 1.0 points is normal and probably healthy. It reflects the structural proximity discount. The gap that demands intervention is the 2.0+ on any single category, or the 1.5+ across three or more categories.
The gap is not a one-time measurement. It moves as the operation evolves. Quarterly reassessment is the right cadence; annual is too slow to catch the patterns before they become structural.
Implications for the rest of the assessment
Making the perception gap the primary diagnostic has cascaded into three other parts of the instrument:
Category recommendations are now gap-weighted. A category scoring 0/0 (both leadership and team rate it neutral) is treated differently than a category scoring +1/−1 (leadership rates it healthy, team rates it weak). Same category-level value; very different operating priority.
The archetype headline accounts for gap shape. The 7 operator personas (Founder Bottleneck, Heroic Operator, Data Mirage, Quiet Drift, Crisis Operator, PE Portfolio Operator, Compounding Operator) are partly defined by the gap profile, not just the score profile.
The AI Readiness Index has the gap baked in. Operations with wide perception gaps on Data & Metrics get a lower AI Readiness Index even when leadership rates the data layer highly — because the team's lower read is the better signal for whether AI deployment will produce gain or amplify confusion. The full methodology is in Memo 007.
What to do with the gap finding
Three moves the data supports.
Share the report with the team respondent within 48 hours of receiving it. The act of sharing closes some of the gap immediately. It signals that both views are valid data and the gap is information, not judgment.
For the widest-gap category, schedule a 30-minute conversation with the team respondent. Not to defend your score. To ask: "Here's what I see in this category. Here's what you see. What am I missing?" That conversation is the install, not the kit.
Pick the kit that maps to the widest-gap category and start there. The report's recommendation engine sequences this for you. The category with the largest gap is the one where intervention produces the most visible operating change in the shortest cycle.
Gap-driven intervention is what makes the assessment a diagnostic tool instead of a self-report.
Methodology footnote
The two-layer model in production today supports up to four respondent layers in the underlying schema (leadership / senior leader / mid-management / individual contributor), but the v1 report surfaces only the two most diagnostic — leadership vs. senior leader. Four-layer reporting is on the roadmap when sample sizes support it without compromising respondent anonymity.
Gap thresholds (narrow / medium / wide / single-category-wide) are calibrated against the distribution of two-layer assessments completed since the model launched. Thresholds will re-calibrate annually as the sample grows. The diagnostic logic — that the gap is more diagnostic than the score — does not depend on the specific thresholds.
Memo 006 lives at /memo/006. Cited by the Ops Health Check report's perception-gap section. See also: Memo 007 (AI Readiness Index Methodology), Memo 005 (the five-category locked design), and [Memo 007](/memo/007) sub-spec · Risk-Framing Archetypes.md from the Design Office for the risk-headline mechanics that key off this gap data.