Playbook kit

Ops Health Check kit.

Twelve-question diagnostic across the five Assess categories. Stage-adjusted scoring, top-three leverage risks ranked by cost of delay.

Download the kit →Read the Assess phase

§ Sample artifact · what the install produces

Sample output — Per-founder scored report.

Sample · Ops Health Check kit

Twelve-question diagnostic across the five Assess categories. Stage-adjusted scoring, top-three leverage risks ranked by cost of delay.

Final output · Per-founder scored report

§ Install order · five steps

How it actually goes in.

STEP · 01

Open the assessment in a quiet window.

Close email, close Slack. The accuracy of the report is proportional to the honesty of the answers, and honest scoring requires uninterrupted attention. Twelve minutes, one sitting.

Day 1 · 12 minutes

STEP · 02

Read the stance ruler before each answer.

For each question, read the ruler placement before placing yourself. The instrument is calibrated against optimism bias; if you're between two scores, choose the lower one.

During the assessment

STEP · 03

Read the report top-down.

Archetype headline first, then category breakdown, then perception gap, then AI Readiness Index. Don't skip to the composite. The order is sequenced by diagnostic priority.

Within 24 hours

STEP · 04

Share the report with one senior leader.

Their reaction is data. If they push back on a score, that pushback usually points to a perception gap worth measuring formally with the two-layer extension.

Within 48 hours

STEP · 05

Pick the top-ranked risk; install the recommended kit.

The report ranks risks by cost of delay. Start with the kit named for risk #1. The kit guide and companion essay are the install detail.

Next 30 days

§ Outcomes scorecard

What good looks like, ninety days in.

Time to clearer next move

12 min

Operators report having a sharper sense of operating priorities by the time they finish the report.

Top three risks ranked by cost of delay

3 / 3

Every report ranks the top three operational risks with the kit recommended to close each.

Perception gap visible (two-layer mode)

±2.0

The gap between leadership and team scores surfaces as the headline diagnostic when both respondents complete the assessment.

Cost

Free, no credit card, no account required. The diagnostic is the front door of every engagement decision.

§ The operator narrative

Why this kit is worth installing.

The Twelve Minutes That Re-Frame the Next Quarter

There is a class of operating diagnostic that takes a full week, requires a consultant on-site, and produces a 40-page deck nobody reads. There is another class that takes ninety seconds, produces a generic score, and tells the operator nothing they didn't already know.

The Ops Health Check is built deliberately to occupy neither category. Twelve minutes. Twelve questions. Five categories. A scored report that ranks the top three operating risks by cost of delay and routes the operator to the specific kit most likely to close each gap.

Most operators who run it tell me the same thing afterward, in some version: "I knew most of this. I didn't know how to rank it." The diagnostic value is not in surfacing problems the operator hadn't noticed. It's in producing a structured framing that converts vague operating frustration into a specific install sequence.

That is what the front-door kit does. This essay covers why the design choices in the kit matter, what makes the twelve-minute experience produce real diagnostic value, and what operators are supposed to do with the report once they have it.

The Five Categories That Cover the Operating System

The five Ops Check categories — Strategy Clarity, Operating Cadence, Decision Rights, Accountability, Data & Metrics — are not arbitrary. They emerged from a structural observation: every consequential operating failure I have watched at mid-market scale resolves to a weakness in one of these five areas. Most failures resolve to weakness in two or three of them simultaneously.

The five are also chosen to be non-overlapping in a specific way. Strategy Clarity is about what the operation is trying to do. Operating Cadence is about the rhythm that converts strategy into decisions. Decision Rights is about who decides what. Accountability is about what happens when standards are missed. Data & Metrics is about whether the measurement layer underneath the rest is trustworthy.

Each category has a specific install path. Strategy Clarity → OKR / KPI Tree Builder. Operating Cadence → Cadence Calendar. Decision Rights → Decision Rights Matrix (often paired with Twenty-Decision Audit). Accountability → Function Dashboard Kit + Weekly Retro Template. Data & Metrics → Metric Trust Register + Function Dashboard Kit.

The assessment's diagnostic value comes from ranking the categories by cost of delay rather than by raw score. A category scoring -1.0 in isolation has lower cost of delay than the same category scoring -1.0 when an adjacent category is also weak (compound risk). The ranking algorithm accounts for this; the report sequences your top three risks in the order that produces the biggest operating gain per quarter of focused work.

Why Twelve Minutes Is the Right Duration

Operating assessments fail in one of two ways. The short ones produce data too thin to be diagnostic. The long ones never get completed.

Twelve minutes is the duration where the assessment is long enough to capture nuance and short enough to actually get finished. The five categories × two questions each = ten core questions, plus two cross-cutting questions for fragility patterns that span multiple categories. Each question takes roughly a minute to read, consider, and answer thoughtfully.

Operators who complete the assessment in under eight minutes are usually answering by feel rather than by evidence. Operators who take longer than twenty minutes are usually over-thinking the stance-ruler placement; the right answer is almost always the one the operator's gut produces in the first thirty seconds, with their conscious mind looking for reasons to upgrade the score.

The twelve-minute design is also a credibility signal. Operators have been burned by "assessments" that take three hours, produce generic results, and end with a vendor pitch. The Ops Check is structurally different from those experiences. Twelve minutes. No vendor pitch. A scored report. Free.

The Stance Ruler — Why Asymmetric Scoring Matters

Each question is scored on a stance ruler from -2 to +2, with five anchor points: active dysfunction (-2), underdeveloped (-1), functional in normal conditions / fragile under stress (0), working consistently (+1), best-in-pattern (+2).

The asymmetry is deliberate. A score of 0 is not "average" — it is "barely surviving the typical operating week." This calibration matters because operators rating themselves by gut tend to anchor "average" at the middle of the scale, which inflates scores across the board. The asymmetric design pulls the gut response toward the more honest answer.

The five anchor points map to operating reality. -2 is the category in active crisis. -1 is the category that costs the operation visible velocity. 0 is the category that holds during normal weeks and breaks during stress. +1 is the category that holds during stress. +2 is the category other operations would benchmark against.

Most operations score between -1 and +1 across most categories. The Ops Check distribution data shows that composite scores cluster around the 0-to-+0.5 band, with a long left tail (operations in distress) and a short right tail (operations that have done deliberate foundation work). The +2 scores are rare and usually validated against multiple respondents before being trusted.

The Two-Layer Model — Why It's the Headline

The single most diagnostic feature of the Ops Check is the two-layer model: leadership respondent + senior-leader respondent, both completing the same twelve questions, with the perception gap rendered as the headline metric.

The two-layer model emerged from a structural observation that the prior single-respondent assessment missed. Every operating consultant I have worked with has told me, in some form, the same thing: the gap between leadership's view and the team's view of the same operation is the most diagnostic data point in any engagement. Single-respondent assessments cannot surface this gap by construction.

Memo 006 covers the methodology in depth. The short version: leadership scores intent; team scores execution; the gap is the operating reality. CEOs score the operation one to one-and-a-half points higher than their teams on most categories. Wide perception gaps on specific categories are the most pointed diagnostic the report produces.

Operators who complete the assessment in single-respondent mode get a useful report but not the full diagnostic. The first follow-up the report recommends is having one senior leader take the same assessment separately. The two-layer comparison is what converts the report from a self-report into a diagnostic.

What the Report Actually Produces

The Ops Check report renders five sections, in this order:

Section 1 — Operator archetype headline. One paragraph naming the operating situation the operator is in. One of seven archetypes (Founder Bottleneck, Heroic Operator, Data Mirage, Quiet Drift, Crisis Operator, PE Portfolio Operator, Compounding Operator). The archetype is selected based on the operator's full category profile, perception gap shape, and stage.

Section 2 — Category-level breakdown. Five rows, one per category, with the score, the band, the one-paragraph narrative for the band, and the recommended next kit. The breakdown is the substantive read of the operation.

Section 3 — Perception gap (two-layer reports only). Side-by-side comparison of leadership and team scores, with the widest gaps ranked first. The diagnostic that's hard to argue with.

Section 4 — AI Readiness Index. A derived score (0-10) with one of five tier labels. Tells the operator whether AI deployment would compound or amplify existing weaknesses. The methodology lives in Memo 007.

Section 5 — Top three risks, ranked by cost of delay. The action layer. Each risk has the recommended next kit, the install effort, the expected outcome timeline.

The report is sized to be read in under fifteen minutes by an operator who already knows the business. The structure is designed so each section is independently shareable — operators frequently share specific sections with a peer or a leader for input.

When to Run the Assessment

There are four moments when running the Ops Check produces the most diagnostic value.

In the first thirty days of a new operator role. Before committing to a 100-day plan. The assessment surfaces the gap between what the prior leadership intended and what the operating team actually experiences. New operators who run the assessment in week three usually find the report reshapes their 100-day priorities materially.

Before a board strategy session. The category scores give the board a shared diagnostic instead of a CEO-only narrative. The conversation about which category is constraining the operation becomes evidence-based rather than narrative-driven.

At the start of any quarter expected to be harder than the last one. The Ops Check is calibrated for diagnosis under stress. It surfaces the structural weaknesses that show up first when conditions tighten. Operators who run it preemptively are better positioned than operators who run it reactively.

Before any AI deployment that depends on operating data. The AI Readiness Index is derived directly from the assessment and tells the operator whether AI investment will compound or waste. Operations that deploy AI without this diagnostic typically discover the foundation problems six months in, after the AI workflows have been producing outputs nobody fully trusts.

What the Assessment Does NOT Do

Three clarifications that operators sometimes need.

The assessment is not industry benchmarking. The stance ruler is calibrated against operational quality, not against industry-specific KPIs. Operations in different industries with the same composite score are at similar levels of operating maturity, even when the industries have very different absolute performance norms.

The assessment is not a substitute for a deep on-site engagement. It is the diagnostic that scopes such an engagement, not a replacement for it. Operations that need foundational install work get more value from the engagement than they would from the assessment alone; the assessment is what tells them whether the engagement is needed and which kits to start with.

The assessment is not a one-time measurement. Operating systems drift. Quarterly re-assessment is the right cadence; annual is too slow to catch the patterns before they become structural. Operators who re-take the assessment 90 days after their first install can measure whether the install produced real movement on the relevant category.

The Compounding Use of the Assessment

Operators who run the Ops Check once get a snapshot. Operators who run it on a 90-day cadence get a trend.

The trend is the more diagnostic data. Composite scores moving from +0.2 to +0.8 over two quarters indicate that foundation work is producing real structural change. Composite scores flat at +0.5 across multiple quarters indicate plateau — the operation has stabilized at its current level of capability and isn't compounding. Composite scores drifting backward (from +0.5 to +0.2) indicate that the operating discipline is degrading; usually a leadership-attention problem.

The trend data, viewed at the platform level for operating partners running multiple portfolio companies, is even more diagnostic. Cross-portfolio assessment data surfaces which portcos are improving on which categories, which platform-level investments are landing, and which need different intervention. The Portfolio Rollout Playbook (kit-12) is built around this cross-portco data flow.

What to Do This Week

If you have not yet run the assessment, twelve minutes between now and Friday is what it costs.

If you have run it in single-respondent mode, the most useful next step is to have one senior leader on your team take the same assessment separately. The two-layer comparison renders in the report and produces a diagnostic the single-respondent version cannot.

If you have run the two-layer version, the most useful next step is to share the report with the team respondent within 48 hours. The conversation that surfaces from the share is usually the install — discussion of the widest perception gaps surfaces the structural issues that the kits then close.

The kit guide at /playbooks/ops-health-check covers the assessment mechanics in detail. The assessment itself takes twelve minutes at /ops-check. The report tells you what the next install is.

Twelve minutes is the cost. The next quarter of operating work has its priorities.

Companion essay to the Ops Health Check kit. The front door of the Zero Confines library.

Sibling kits in the Diagnostic bundle

01 · AssessTemplate

Twenty-decision audit.

01 · AssessTemplate

Metric trust register.

Free · download now

Take it, install it, tell us what we got wrong.

The kits are versioned in public. Quarterly updates. Newsletter subscribers see the change log first.

Subscribe to Field-notes →

Not sure where to start?

The Ops Health Check is the front door.

Twelve minutes. Personalized phase-by-phase output. Then come back and pick the kit that matches what came out.

Take the Ops Health Check →