How 24 questions serve four analytical objectives
The Health Check needs to accomplish four things simultaneously from a single survey administration. Each objective requires a different analytical lens on the same 24 data points.
1. Executive View — The leader's perception of system health, scored by ledger, category, and stage. Their mental model of where the system is strong and where it's broken.
2. Team View — The aggregate perception of the people who operate within the system daily. Central tendency and dispersion — not just what the team believes, but how much they agree.
3. Disparity Analysis — The gap between the executive's view and the team's view (blind spots, shared pain, false confidence). The gap within the team (fragmented perception). Both are diagnostic findings, not just data points.
4. Cycle Localization — Mapping scores to the I→A→X→A→I stages to determine where specifically in the execution cycle the system breaks down. "Stuck at Insight" demands different intervention than "stuck at eXecution."
The key design constraint: a single 24-question survey, administrable via any standard forms tool (Google Forms, Microsoft Forms, Typeform, SurveyMonkey), must produce all four analyses. No custom tooling required for data collection. Analysis can be done in a spreadsheet.
Questions nest three ways simultaneously
Each question belongs to three hierarchies at once. This is the structural innovation that allows one survey to serve four objectives.
Hierarchy 1: Ledger (2 groups of 12)
The primary diagnostic axis. Reality Ledger (R1–R12) measures shared truth. Delivery Ledger (D1–D12) measures owned action. The gap between them reveals the coupling state.
Hierarchy 2: Category (8 groups of 3)
The granular diagnostic. Each ledger contains 4 categories of 3 questions each. Categories identify which domain within a ledger is weakest — is it the facts, the tradeoffs, the ownership, or the authority?
Hierarchy 3: IAXAI Stage (4 groups of 6)
The cycle localization axis. By pairing two categories per stage, we can determine where in the I→A→X→A→I cycle execution breaks. This is the hierarchy that no other diagnostic provides.
Every question, every hierarchy, every stage
Scale: 1 = Never true, 2 = Rarely, 3 = Sometimes, 4 = Usually, 5 = Always true
| ID | Question | Ledger | Category | Stage |
|---|---|---|---|---|
| R1 | When a problem surfaces, all stakeholders are working from the same set of facts. | Reality | Shared Facts | Insight |
| R2 | Data and status updates reach the people who need them without someone having to chase it down. | Reality | Shared Facts | Insight |
| R3 | New team members can find out what is actually happening without relying on tribal knowledge. | Reality | Shared Facts | Insight |
| R7 | Resource limitations (time, money, people) are acknowledged openly, not quietly absorbed. | Reality | True Constraints | Insight |
| R8 | Deadlines reflect actual capacity, not aspirational thinking. | Reality | True Constraints | Insight |
| R9 | When something is not going to work, people say so before it fails — not after. | Reality | True Constraints | Insight |
| R4 | Tradeoffs are stated out loud before decisions are made — not discovered after. | Reality | Honest Tradeoffs | Alignment |
| R5 | People feel safe raising bad news or contradicting the prevailing narrative. | Reality | Honest Tradeoffs | Alignment |
| R6 | When two priorities conflict, the organization resolves it explicitly rather than pretending both will get done. | Reality | Honest Tradeoffs | Alignment |
| R10 | Reports to leadership reflect what is actually happening, not a polished version of it. | Reality | No Spin | Alignment |
| R11 | The story told to investors, the board, or external partners matches internal reality. | Reality | No Spin | Alignment |
| R12 | People do not have to translate between what is said and what is meant in this organization. | Reality | No Spin | Alignment |
| D1 | Every active initiative has a single person who owns the outcome — not just the tasks. | Delivery | Explicit Ownership | eXecution |
| D2 | When something goes wrong, it is clear who is accountable without a blame conversation. | Delivery | Explicit Ownership | eXecution |
| D3 | Ownership is assigned at the start of work, not figured out as things unfold. | Delivery | Explicit Ownership | eXecution |
| D4 | People with accountability also have the authority to make decisions in their domain. | Delivery | Clear Authority | eXecution |
| D5 | A decision made by the right person stays decided — it does not get relitigated. | Delivery | Clear Authority | eXecution |
| D6 | Managers do not need to escalate routine decisions; they have real decision rights. | Delivery | Clear Authority | eXecution |
| D7 | It is clear which decisions require group input and which are made by an individual. | Delivery | Decision Rights | Accountability |
| D8 | Meetings end with explicit next steps and named owners, not vague consensus. | Delivery | Decision Rights | Accountability |
| D9 | Cross-team decisions have a defined process — they do not require a leader to broker every time. | Delivery | Decision Rights | Accountability |
| D10 | The same fire does not have to be fought more than once. | Delivery | Sustainable Rhythm | Accountability |
| D11 | Leaders can take time off without the system stalling. | Delivery | Sustainable Rhythm | Accountability |
| D12 | The pace of work is one the team can maintain for the next twelve months. | Delivery | Sustainable Rhythm | Accountability |
The conceptual logic of the stage assignments
Shared Facts + True Constraints → 6 questions
Insight is about whether the raw material of shared reality exists. Do people have access to the same facts (R1–R3)? Are the actual limitations visible rather than hidden (R7–R9)? If these score low, the organization cannot begin the cycle — it's operating on divergent or aspirational versions of reality. The diagnosis hasn't happened yet.
Honest Tradeoffs + No Spin → 6 questions
Alignment is about whether shared reality is agreed upon and honest. Are tradeoffs named explicitly (R4–R6)? Does what's reported internally match what's communicated externally (R10–R12)? If these score low, the facts may exist but they haven't been processed into shared commitments. The organization sees reality but hasn't converged on what to do about it.
Explicit Ownership + Clear Authority → 6 questions
eXecution is the coupling point — the trunk of the tree. Has shared, agreed-upon reality been converted into named, empowered ownership? Do owners exist (D1–D3) and do they have the authority to act (D4–D6)? If these score low while Reality scores high, the organization is in Paralysis — strong roots, wilting canopy. The coupling is broken at the handoff.
Decision Rights + Sustainable Rhythm → 6 questions
Accountability is whether the delivery system functions under load. Is the decision-making process itself clear (D7–D9)? Can the system sustain without heroic effort (D10–D12)? If these score low while eXecution scores adequately, ownership was assigned but the system for maintaining and enforcing it doesn't hold. The canopy grew but can't sustain itself.
Not directly measured — inferred from longitudinal change
Intelligence is the meta-stage. It has no dedicated questions because it examines the coupling itself — the health of the loop. It is measured by change across administrations: did the coupling gap narrow? Did the weakest stage improve? Did variance decrease? Intelligence is the delta, not the snapshot. It's why we re-measure.
Every score the instrument produces
| Level | Components | Range | What It Reveals |
|---|---|---|---|
| Overall | All 24 questions | 24–120 | Gross system health. Screening measure. |
| Ledger (×2) | 12 questions each | 12–60 | Which half of the coupled system is weaker. |
| Category (×8) | 3 questions each | 3–15 | Which domain within a ledger is weakest. |
| Stage (×4) | 6 questions each | 6–30 | Where in I→A→X→A the cycle breaks. |
| Coupling Gap | |Reality% – Delivery%| | 0–100% | The balance between the two systems. |
Interpretation thresholds (percentage of max)
| Range | Level | Meaning |
|---|---|---|
| 80–100% | Strong | System is well-designed. Monitor for drift. |
| 60–79% | Moderate | Real strengths but clear gaps. Debt accumulating in specific areas. |
| 40–59% | Needs Work | System is under-designed. Leadership compensating for structural gaps. |
| Below 40% | Critical | System significantly incomplete. Leadership exhaustion is systemic. |
These thresholds apply at every level: overall, ledger, category, and stage. A category at 80%+ with another at 40% tells a sharper story than the ledger average alone.
Three failure modes from two scores
| Failure Mode | Condition | Root/Canopy Read | Primary Intervention |
|---|---|---|---|
| Paralysis | Reality ≥ 60%, Delivery < 60% | Strong roots, wilting canopy | Start at eXecution stage — assign ownership, match authority |
| Chaos | Reality < 60%, Delivery ≥ 60% | Big canopy, shallow roots | Start at Insight stage — establish shared facts before acting |
| Firefighting | Both < 50% | Both systems degraded | Start at coupling — both ledgers simultaneously |
Paralysis and Chaos are single-ledger failures — the direct consequence of treating only one side. They are what State B looks like in the data. Firefighting is coupled degradation — State C. The failure mode doesn't just name the problem. It names which ledger was treated and which was neglected.
Objective 4: pinpointing where the cycle breaks
This is the analysis no other diagnostic produces. By computing a score for each of the four measurable stages, we identify where specifically the execution cycle fails.
Insight_Score = (R1 + R2 + R3 + R7 + R8 + R9) / 30 × 100
Alignment_Score = (R4 + R5 + R6 + R10 + R11 + R12) / 30 × 100
Execution_Score = (D1 + D2 + D3 + D4 + D5 + D6) / 30 × 100
Accountability_Score = (D7 + D8 + D9 + D10 + D11 + D12) / 30 × 100
── Weakest Stage = Primary Failure Point ──
Failure_Stage = min(Insight, Alignment, Execution, Accountability)
── Stage Gap = difference between strongest and weakest ──
Stage_Gap = max(all stages) − min(all stages)
→ Gap > 20pts: the cycle is breaking at a specific point
→ Gap < 10pts: degradation is distributed, not localized
Reading the stage profile
| Profile Pattern | Diagnosis | Intervention Entry Point |
|---|---|---|
| I low, A/X/A moderate+ | The organization can't see clearly. Facts are siloed, constraints hidden. Downstream stages are working from incomplete reality. | Begin at Insight: kill competing data sources, surface true constraints, establish single source of truth. |
| I adequate, A low, X/A moderate+ | Facts exist but aren't agreed upon. Tradeoffs are implicit. Internal narrative differs from external. Reality is available but not shared. | Begin at Alignment: force-rank priorities, document tradeoffs explicitly, eliminate spin. |
| I/A adequate, X low, A moderate | The coupling is broken. Shared truth exists but hasn't converted to owned commitment. Classic Paralysis — everyone sees it, no one owns it. | Begin at eXecution: name singular owners, match authority, document decision rights. |
| I/A/X adequate, A₂ low | Ownership exists but the delivery system can't sustain it. Decisions get relitigated. The same fires recur. Leaders are load-bearing walls. | Begin at Accountability: clarify decision process, break recurring cycles, establish sustainable rhythm. |
| All low, small gap | Distributed degradation. Firefighting. The leader is the system. | Begin at the coupling — both ledgers simultaneously. Apply all four Operator Rules. |
What the individual administration produces
The executive completes all 24 questions alone. No guidance on answers — the value is in their honest perception. This produces:
Exec_Reality = sum(R1..R12) → score/60 → percentage
Exec_Delivery = sum(D1..D12) → score/60 → percentage
Exec_Overall = Reality + Delivery → score/120
Exec_Gap = |Reality% − Delivery%|
Exec_Failure_Mode = determined by ledger percentages
Exec_Stage[4] = 4 stage scores → weakest = failure point
Exec_Cat[8] = 8 category scores → weakest two = priority focus
After completion, the practitioner walks through results together. The first question: "Does this match what you feel in your day-to-day?" Discrepancies between the scored result and the leader's gut feeling are themselves diagnostic. The score shows what the leader believes about the system. The gut shows what the leader experiences. When those diverge, it usually means the leader is compensating for structural gaps without realizing it.
Aggregate perception and internal agreement
Each team member completes the 24 questions independently and anonymously. Responses are aggregated to produce both central tendency (what the team believes) and dispersion (how much they agree).
Team_Mean[q] = average of all responses for question q
Team_SD[q] = standard deviation of responses for question q
Team_Min[q] = lowest response (the most concerned person)
Team_Max[q] = highest response
Team_Range[q] = Max − Min
── Per Category (c = each of 8 categories) ──
Cat_Mean[c] = mean of 3 constituent question means
Cat_SD[c] = pooled SD across 3 constituent questions
── Per Ledger ──
Team_Reality = sum of question means for R1..R12
Team_Delivery = sum of question means for D1..D12
── Per Stage ──
Stage_Mean[s] = mean of 6 constituent question means / 30 × 100
Stage_SD[s] = pooled SD across 6 constituent questions
The variance finding
Standard deviation per question
Any question where SD > 1.2 (on the 5-point scale) means people experience that aspect of the system fundamentally differently. For context: if half the team answers 2 and half answers 4, SD ≈ 1.0. If the spread is wider — 1s and 5s — SD climbs above 1.4.
High variance is not a Delivery problem. It is a Reality Ledger failure. People are not seeing the same system. The Health Check has just demonstrated the very failure it's designed to detect — in real time, with their own data.
Sample size considerations
| Team Size | Statistical Approach | Notes |
|---|---|---|
| N ≥ 8 | Full analysis: means, SDs, perception gaps, all thresholds apply | Preferred. Adequate for parametric statistics on Likert data. |
| N = 5–7 | Means and SDs valid but interpret cautiously. Flag where N is small. | SD thresholds still useful but single outliers have more influence. |
| N < 5 | Report medians and ranges rather than means/SDs. Treat as directional. | Too few for reliable variance measures. Use for conversation, not diagnosis. |
Executive vs. team — and team vs. team
This is the most powerful deployment. The executive takes Mode A. The team takes Mode B. Then we compute two types of disparity.
Disparity Type 1: Perception Gap (Executive vs. Team)
Perception_Gap[q] = Exec_Score[q] − Team_Mean[q]
→ Positive gap: leader sees system as healthier than team does
→ Negative gap: leader more critical than team
── Thresholds ──
|Gap| ≥ 2.0 → Critical divergence. Leader and team see different systems.
|Gap| ≥ 1.5 → Notable divergence. Worth investigating.
|Gap| < 1.0 → Reasonable alignment on this dimension.
Four diagnostic patterns
| Pattern | Condition | What It Means |
|---|---|---|
| Blind Spot | Exec ≥ 4, Team Mean ≤ 3 | The leader believes this works because it works for them. The team experiences a different reality. Most common and most dangerous pattern. |
| Shared Pain | Both ≤ 3 | Everyone agrees it's broken. Start here. Alignment already exists — move directly to Operator Rules. |
| False Confidence | Exec ≥ 4, Team SD > 1.2 | Appears functional from the top. Inconsistently experienced at the working level. Leader anchors on the successful instances; team lives the variance. |
| Unacknowledged Strength | Team Mean > Exec by ≥ 1.5 | Leader carries concern about something the team has already resolved. Frees attention for real gaps. |
Disparity Type 2: Intra-Team Fragmentation
Fragmentation[q] = Team_SD[q]
SD > 1.2 → Fragmented perception. People see this differently.
SD 0.8–1.2 → Normal variation. Some disagreement, not structural.
SD < 0.8 → Strong consensus. Team agrees on this dimension.
── Per Category ──
Cat_Fragmentation[c] = mean SD of 3 constituent questions
── Per Stage ──
Stage_Fragmentation[s] = mean SD of 6 constituent questions
→ High stage fragmentation = team doesn't agree on
whether this part of the cycle works. That IS the finding.
A category mean of 3.5 with SD of 0.5 means "everyone thinks this is mediocre." A category mean of 3.5 with SD of 1.4 means "some people think this is great, some think it's terrible, and the average is meaningless." The second case is a more urgent finding — because the disagreement itself is a Reality Ledger failure. People are experiencing different organizations.
What the facilitated session works from
The Mode C report has five sections, each mapping to a specific conversation the practitioner facilitates.
| Section | Content | The Conversation It Opens |
|---|---|---|
| 1. Overview | Exec scores vs. Team means — overall, per ledger, per stage. Direction and magnitude of each gap. | "Here's what you see. Here's what the team sees. Let's talk about the distance between them." |
| 2. Perception Gap | All 24 questions sorted by gap magnitude. Flagged: gap ≥ 2 (critical), gap ≥ 1.5 (notable). | "These are the specific dimensions where you and your team see different systems." |
| 3. Fragmentation Map | All 24 questions sorted by team SD. Flagged: SD > 1.2 (fragmented). | "These are the dimensions where your team doesn't agree with each other. This is a Reality Ledger failure demonstrated in real time." |
| 4. Stage Profile | Four stage scores (exec and team). Weakest stage highlighted. Stage gap computed. | "This is where the cycle breaks. This is where we start." |
| 5. Priority Actions | If large gap: align before acting. If small gap + low scores: apply Operator Rules directly. Derived from the comparative analysis, not from either score alone. | "Here's the one thing to fix first. Here's who owns it. Here's when we measure again." |
Is the math sound?
Likert scale treatment
The 5-point Likert scale produces ordinal data. The standard practice in organizational survey research — supported by Carifio & Perla (2008), Norman (2010), and Sullivan & Artino (2013) — is to treat 5-point Likert items as interval data when computing means and standard deviations, provided items are aggregated into scales of 3 or more. The Health Check meets this criterion at every level: 3 items per category, 6 per stage, 12 per ledger.
Internal consistency
Each category contains 3 questions measuring the same construct. This is the minimum for computing Cronbach's alpha (α). Target: α ≥ 0.70 per category. Each stage contains 6 questions (two categories), which provides better reliability. After the first 20+ administrations, alpha should be computed per category and per stage. If any category falls below 0.65, the constituent questions may need revision — they may be measuring different constructs.
Why 3 questions per category (not 4 or 5)
Three is the minimum for internal consistency measurement while keeping the total instrument at 24 questions (5-minute administration). Expanding to 4 per category would require 32 questions, increasing completion time to ~7 minutes and introducing fatigue effects. The 24-question design is optimized for executive tolerance — the people who most need to take it are the people with the least patience for surveys.
SD threshold of 1.2
On a 5-point scale, SD of 1.2 represents 24% of the full scale range (4 points from min to max). For context:
| Team Distribution | Approximate SD | Interpretation |
|---|---|---|
| All respondents answer 3 or 4 | ~0.5 | Strong consensus. Minor variation. |
| Split between 2, 3, and 4 | ~0.8 | Normal variation. Not alarming. |
| Half answer 2, half answer 4 | ~1.0 | Emerging divergence. Worth noting. |
| Spread across 1–4 or 2–5 | ~1.2 | Threshold. Fragmented perception. |
| Bimodal: 1s and 5s | ~1.6+ | Fundamentally different experiences of the same system. |
The 1.2 threshold catches meaningful disagreement without being overly sensitive. It fires when the team is genuinely split, not when there's normal variation in perspective.
Perception gap threshold of ≥ 2.0
A 2-point gap on a 5-point scale means the executive and team mean are separated by 40% of the scale range. This is a strong signal — the executive answering "Usually" while the team mean is "Rarely." The 1.5 threshold (30% of range) is flagged as "notable" — enough to investigate, not enough to alarm.
The coupling gap
Computed as |Reality% – Delivery%|, where each percentage is the ledger score divided by its maximum (60). Using percentages rather than raw scores normalizes for the fact that both ledgers have identical scales, making the gap directly interpretable.
── Interpretation ──
Gap < 10% → Ledgers are reasonably balanced. Focus on overall level.
Gap 10–20% → Imbalance emerging. One ledger pulling ahead or behind.
Gap > 20% → Single-ledger plateau territory. The stronger ledger
was treated. The weaker is pulling it back.
Boundaries of the diagnostic
Intellectual honesty about what the Health Check doesn't do:
It measures perception, not objective reality. The Health Check measures how people experience the system, not whether the system is objectively well-designed. This is a feature, not a bug — perception IS the operational reality. But it means the scores can be influenced by recent events, recency bias, or organizational mood.
It doesn't measure the Intelligence stage directly. Intelligence (the learning loop) is inferred from longitudinal change, not from a single administration. One snapshot can localize the failure. Only repeated measurement can tell you whether the system is learning.
3 questions per construct is the minimum, not ideal. With 3 items, one poorly understood question can skew a category score significantly. This is acceptable for a 5-minute diagnostic but means individual category scores should be treated as directional, not precise. Stage scores (6 items) and ledger scores (12 items) are more reliable.
It doesn't explain causality. The Health Check identifies where the system is weak. It doesn't explain why. That's what the practitioner conversation is for — the facilitated session after the data is presented. The instrument creates the diagnostic map. The practitioner reads it.
Any standard forms tool — nothing custom required
Form structure
| Field | Type | Purpose |
|---|---|---|
| Role identifier | Dropdown: "Executive" / "Team Member" | Separates Mode A from Mode B in the same data set. Enables comparative analysis without separate forms. |
| R1 through R12 | 5-point scale (radio or slider) | Reality Ledger questions. Labeled: Never / Rarely / Sometimes / Usually / Always |
| D1 through D12 | 5-point scale (radio or slider) | Delivery Ledger questions. Same labels. |
Total: 25 fields (1 role identifier + 24 Likert items). No open-text fields. No conditional logic. Any forms tool that supports radio buttons can run this.
Administration sequence
Present Reality Ledger questions first (R1–R12), then Delivery Ledger (D1–D12). Within each ledger, present in category order. Do not randomize — the conceptual flow from Shared Facts → Honest Tradeoffs → True Constraints → No Spin creates a natural progression that aids honest reflection. Same for Delivery: Ownership → Authority → Decision Rights → Sustainable Rhythm builds from "who owns it" to "can the system sustain."
Framing language (pre-survey)
"This is a system diagnostic, not a performance review. Answer about the system, not about any individual. There are no right answers — only your honest experience of how the organization operates day to day."
The formulas — ready to paste
After exporting responses to CSV/Excel, the analysis requires straightforward formulas. Below assumes row 1 = headers, row 2 = executive response, rows 3+ = team responses. Columns B through Y contain the 24 question responses (R1–R12 in B–M, D1–D12 in N–Y).
Reality_Score = SUM(B:M) ← sum of R1..R12
Delivery_Score = SUM(N:Y) ← sum of D1..D12
Reality_% = Reality_Score / 60 × 100
Delivery_% = Delivery_Score / 60 × 100
Coupling_Gap = ABS(Reality_% − Delivery_%)
── STAGE SCORES ──
Insight = (R1+R2+R3+R7+R8+R9) / 30 × 100
Alignment = (R4+R5+R6+R10+R11+R12) / 30 × 100
Execution = (D1+D2+D3+D4+D5+D6) / 30 × 100
Accountability = (D7+D8+D9+D10+D11+D12) / 30 × 100
── TEAM AGGREGATES (per question column, rows 3+) ──
Team_Mean[q] = AVERAGE(q3:qN)
Team_SD[q] = STDEV(q3:qN)
Team_Min[q] = MIN(q3:qN)
Team_Max[q] = MAX(q3:qN)
── PERCEPTION GAP (per question) ──
Gap[q] = Exec_Score[q] − Team_Mean[q]
── FLAGS ──
Blind_Spot[q] = IF(Exec[q]>=4 AND Team_Mean[q]<=3, "BLIND SPOT", "")
Shared_Pain[q] = IF(Exec[q]<=3 AND Team_Mean[q]<=3, "SHARED PAIN", "")
False_Conf[q] = IF(Exec[q]>=4 AND Team_SD[q]>1.2, "FALSE CONFIDENCE", "")
Fragmented[q] = IF(Team_SD[q]>1.2, "FRAGMENTED", "")
Objective 4+: measuring Intelligence through change
When the Health Check is administered a second time (recommended: 6–8 weeks after intervention begins), the Intelligence stage becomes measurable.
Δ_Stage[s] = Stage_Score_T2[s] − Stage_Score_T1[s]
Δ_Coupling_Gap = Coupling_Gap_T2 − Coupling_Gap_T1
Δ_Variance[q] = Team_SD_T2[q] − Team_SD_T1[q]
── Intelligence Indicators ──
Coupling gap narrowing → system is re-coupling
Weakest stage improving → intervention is correctly targeted
Team variance decreasing → shared reality is strengthening
Perception gap narrowing → exec and team converging on same view
── Warning Signs ──
Coupling gap widening → intervention is single-ledger; plateau incoming
Treated stage improved but
untreated stage declined → coupling degradation; untreated side eroding
Variance increasing → shared reality is fragmenting, not converging
This is where the palindrome earns its name. Intelligence (the measured change) feeds the next Insight (what to diagnose next). The cycle either compounds or it doesn't — and the delta data tells you which.
Validation matrix
| Objective | Required Data | Produced By | Status |
|---|---|---|---|
| 1. Executive View | Individual scores: overall, ledger, category, stage, failure mode | Mode A: single administration, standard scoring | ✓ Complete |
| 2. Team View | Aggregate: means, SDs, medians per question/category/ledger/stage | Mode B: anonymous team, standard aggregation | ✓ Complete |
| 3a. Exec vs. Team | Perception gaps, blind spots, shared pain, false confidence | Mode C: comparative analysis (gap = exec − team mean) | ✓ Complete |
| 3b. Intra-Team | Per-question SD, fragmentation flags, consensus mapping | Mode B analysis: SD per question, threshold flagging | ✓ Complete |
| 4. Cycle Localization | Stage scores, stage profile, weakest stage identification | Stage mapping: 6 questions per stage, min-stage = failure point | ✓ Complete |
| 4+. Intelligence | Deltas across administrations: stage, coupling gap, variance | Mode D: longitudinal comparison (T2 − T1) | ✓ Complete |
What we'll learn from the first 5 administrations
Internal consistency. Compute Cronbach's alpha per category after N ≥ 20 individual responses. If any category α < 0.65, the constituent questions may be measuring different things and need revision.
Stage mapping validity. Does the stage localization match the practitioner's independent diagnosis? If the instrument says "stuck at Alignment" but the practitioner's field reading says "stuck at eXecution," either the mapping needs adjustment or the practitioner has a blind spot.
Threshold calibration. The 60% boundary between failure modes, the 1.2 SD threshold for fragmentation, and the 2.0 perception gap threshold are all reasonable starting points but should be refined with empirical data. They may need adjustment by industry (healthcare vs. financial services) or by organization size.
Question clarity. Any question where the team's SD is consistently high across multiple organizations may be poorly worded — people might be interpreting it differently, not experiencing the system differently. R5 ("People feel safe raising bad news") is the most likely candidate — "safe" means different things in different cultures.