May 2026 · Design
Methodology in action — a prototype
Stage 2 of a series on the design choices behind CoNoggin. Stage 1 mapped the field. This one runs a worked example.
The question
Can the nine parameters from Stage 1 be applied to a real case, used to pick a complementary set of methodologies, and turned into a CoNoggin change intervention that holds together?
Inputs
Scenario
A public-broadcasting newsroom adopts AI research tools — fast triangulation of facts, lead surfacing, long-document structuring — without weakening verification discipline. A documented, current pressure across the industry.
The room
What's already true, before any goal is authored:
- Public-service broadcaster, charter-bound to inform, educate, entertain. Trust is the operating capital.
- News division, ~6,000 staff, sub-units (Investigations · World · Verify · Programmes).
- Published guidelines on AI use in journalism (2024–2025) emphasising human oversight, transparency, accuracy.
- AI-saturated information environment; regulator and audience scrutiny on impartiality.
- Editorial-standards-coded culture; long memory for high-profile errors.
The goal (three-field narrative input)
Authored by the Head of News Operations, Newsroom Investigations:
The change (and how we'll see it)
The team starts using AI research tools regularly for early scoping — without weakening verification discipline. We'll see it in: AI-assisted research entering early drafts with the same standards we apply to anything else. No AI-derived material reaching script without a human-verified trail.
For whom
The full Newsroom Investigations team — 30 people across senior reporters, producers, and editors. Mixed exposure to AI tools so far; mixed appetites; same standards required of all.
The obstacle
Time pressure. When an AI tool gives a fast plausible answer, the temptation is to take it as a starting point and skip the verification step. Two near-misses already this quarter.
The nine-parameter pass
| Question | Read | Confidence |
|---|---|---|
| What kind of problem is this? (Cynefin) | Complex. AI tooling is too new for stable best practice; behaviour patterns must emerge through small experiments. Probe-sense-respond, not analyse-then-prescribe. | High |
| What's the change? | Verification rigour preserved while AI use expands. | High |
| What's the behaviour shift? | Journalists run an explicit verification step on AI-assisted material before it reaches draft scripts. | High |
| Where's the audience starting from? | Mixed across Stages of Change. | Medium |
| Why now? | Two near-misses already; ongoing regulator and audience scrutiny. | High |
| How will we know? | Lead: verification-step completion when AI is used. Lagging: error rate, near-miss count. | High |
| By when? | ~12 weeks to embed habits; ongoing reinforcement after. | Medium |
| What's the lead measure? | Frequency of explicit verification steps in AI-assisted drafts. | High |
| What's the obstacle? | Named: time pressure → lower verification's Ability cost (Fogg's B = M × A × P). | High |
Outputs
Methodology pick
A complementary set, not a single framework. The four load-bearing cards:
Sense-making
Cynefin
Five domains — Simple · Complicated · Complex · Chaotic · Confused — each demanding a different kind of action. Classifies this problem as Complex.
Primary stance · probe-sense-respond
Behaviour
Tiny Habits / B=MAP
Behaviour = Motivation × Ability × Prompt. Time pressure attacks the Abilityaxis — lower verification's ability cost, don't raise motivation.
Behavioural backbone · addresses the named obstacle
Performance support
Five Moments of Need
Training addresses New and More; performance support addresses Apply, Solve, Change.Most things shouldn't be a course.
Discipline · keeps work in real journalism
Evaluation
Reverse Kirkpatrick
Start at the desired Level 4 result and design backwards. Produces a credible lead measure, distinct from the lagging result.
Measurement architecture · designed backwards
The full set, with reasons:
- Sense-making — Cynefin (Snowden, 2007). Classifies the problem as Complex. Rules out a course-led intervention; commits to probe-sense-respond.
- Behaviour — Tiny Habits / B=MAP (Fogg, 2019). Behaviour = Motivation × Ability × Prompt. Time pressure attacks Ability; lower verification's ability cost rather than raise motivation.
- Performance support — Five Moments of Need (Mosher & Gottfredson, 2009) + 70-20-10. This goal lives in Apply / Solve / Change, not New / More. Most learning happens in real journalism, not in a course.
- Discipline — Action Mapping (Moore, 2008). Every activity collapses to observable on-the-job behaviour. No “understand AI” activities.
- Framing — Self-Determination Theory (Deci & Ryan, 1985). Autonomy, competence, relatedness. Craft augmentation, not policing.
- Evaluation — Reverse Kirkpatrick. Start from Level 4 (no AI-induced editorial errors) and design backwards to leading-measure architecture.
Not used: ADKAR, Kotter's 8 Steps.Both assume a stable end state. AI tooling isn't stabilising; ADKAR's Reinforcementphase can't lock in a behaviour the field will keep updating.
Activity composition (within the org's allowed palette)
The organisation's vocabulary of permitted activity types:
1. Asynchronous interactive activities · 2. Collaborative trio cards · 3. Submitted assignments to manager · 4. Briefing docs or videos · 5. Small sub-team meetings of 5 · 6. Recorded webinars.
The composed outline as it appears in CoNoggin — eleven activities in the goal's bucket:
Goal · Newsroom Investigations · 12 weeks
AI-assisted research with verification standards intact
- Briefing — Verification under AI: what counts
- Verification reflection (private)
- Webinar — kickoff with senior editor + Q&A
- Async — the plausible-but-wrong walk
- Async — the plausible-but-wrong simulator
- Weekly verification capture (lead measure)
- Trio AI-trace audit
- Sub-team pre-mortem
- Submitted assignment to manager
- Sub-team wrap — what stuck, what didn't
- Async — what changed
The full table, with what each activity does:
| # | Activity | What it does |
|---|---|---|
| 1 | Briefing doc/video (week 0) | Verification under AI: what counts. Five-min video + one-page doc from leadership. |
| 2 | Async — verification reflection (week 0, private) | Where my habit is strongest, where it might slip. Stays in the journalist's vault. |
| 3 | Webinar recorded (week 1) | 45-min kickoff. Senior editor + a journalist using AI tools well + Q&A. |
| 4 | Async — the plausible-but-wrong walk (week 1) | 15-min self-paced module. Three composite cases on the team's beat. Reader makes choices, sees consequences. Revisitable. |
| 5 | Async — the plausible-but-wrong simulator (weekly) | 5-min weekly. The system gives confident, partly-fabricated answers; the journalist catches and traces. |
| 6 | Async — weekly verification capture (weeks 2–11) | Three lines per week. One AI-assisted piece, what was verified, what almost slipped. The capture is the lead measure. |
| 7 | Trio audit (weeks 3, 5, 7, 9) | Trios audit one of each member's AI-assisted pieces against a rubric. Roles rotate. |
| 8 | Sub-team pre-mortem (week 5) | 60-min per sub-team. What does it look like when AI-induced error reaches broadcast? Risk register produced. |
| 9 | Submitted assignment to manager (week 8) | One real piece with the verification trace shown explicitly. Manager reviews. |
| 10 | Sub-team wrap (week 12) | 30-min per sub-team. What stuck, what didn't, carry-forward proposals. |
| 11 | Async — what changed (week 12) | Self-rated habit shift; named verification patterns; opt-in knowledge-card. |
Measurement
| # | Measure | Success |
|---|---|---|
| 1 | Opened; ack-line submitted | ≥80% / ≥75% |
| 2 | Completion | ≥80% week 1 |
| 3 | Attended-or-watched in 7d; capture submitted | ≥90% / ≥75% |
| 4 | Completion; failure modes flagged | ≥85% / ≥3 of 5 first pass |
| 5 | Weekly completion; per-failure-mode catch rate | ≥70% sustained; catch rate ≥80% by w6, ≥90% by w11 |
| 6 | Weekly submission rate per person (lead measure) | ≥75% sustained |
| 7 | Audits submitted; rubric scores | ≥90% complete; scores trending upward |
| 8 | Session held; risk register | 6/6 sub-teams; ≥5 named failure modes |
| 9 | Submission (gate); rubric scores | 100%; ≥80% meets-standard |
| 10 | Session held; carry-forward proposals | 6/6 sub-teams; ≥3 each |
| 11 | Completion; self-rated habit shift (1–5) | ≥85% / ≥70% rate ≥3 |
Intervention judgment (Reverse Kirkpatrick)
Four headline numbers:
- L4 — Results. Zero AI-induced editorial errors reach broadcast in the 12-week window or the 12 weeks following. (Lagging.)
- L3 — Behaviour. Weekly capture submission rate ≥75% sustained AND week-8 exemplar submissions ≥80% meets-standard.
- L2 — Learning. Simulator catch rate ≥90% by week 11.
- L1 — Reaction. Week-12 self-rating ≥3 from ≥70% of cohort + qualitative wrap-session sentiment positive on craft-augmentation framing.
Successful if L1–L3 hit and L4 holds across the trailing window.
What worked
- The nine-parameter pass produced a clear methodology pick from a 200-word goal + standing context. No further intake form.
- Productive friction between methodologies earned its keep. Each cluster compensated for another's weakness — Tiny Habits without sense-making becomes habit-formation that misses the domain; sense-making without Tiny Habits is “let's experiment” with no behavioural backbone.
- The org's allowed activity palette held the composition realistic without changing the methodology stance. Same intervention, different organisation, different activity composition. The stance is universal; the composition is local.
- The lead measure (weekly verification capture) carried the story. Designed to survive the same constraint the goal does — three lines, not a form.
What didn't, or wobbled
- Two soft-confidence reads (audience starting state; time horizon) needed one targeted clarifying question rather than gating the goal author. Worth designing in as a fallback, not as a default.
- First-pass tendency to over-instrument the measurement — quality-scoring every text artefact across the cohort. Real cost; rarely acted on. Simpler architecture: capture the simple thing; flag exceptions for the manager; let human judgement do quality work.
- Initial activity composition over-reached on AI-native activities (live group choreography, weekly synchronous roleplays). The org-palette constraint forced honesty; the async-heavy backbone is more credible. Adding the role-play and scenario-walk back as async-interactive flavours kept the training value at lower cost.
Conclusions (for now)
- Methodology stance is universal; activity composition is local. Two distinct problems, both required.
- One question shapes everything downstream — what kind of problem is this? Get this wrong and every methodology choice that follows is misapplied.
- Surface stays human; substrate stays disciplined. Goal is articulated as narrative; activity outline is deterministic; measurement is structured. None competes with the others when they're properly separated.
- The simplicity imperative applies to measurement, not just the intervention. Measurement that survives the constraint the intervention survives.