Marketing mix modeling

MMM calibration evidence checklist

Published July 3, 2026. Updated July 7, 2026. Status: evergreen source page.

Calibration is the bridge between a marketing mix model and real-world experiments. It can make a model more useful, but only when the evidence being used as an anchor actually matches the effect the model is asked to estimate.

Use this checklist before a lift test, geo experiment, holdout, brand study, or external benchmark is used to tune a model, validate a channel contribution, or defend a budget recommendation. A calibration point is not a trophy. It is evidence with a unit, population, outcome, window, uncertainty, and decision boundary.

The most common failure is not that a team ignores experiments. It is that the experiment answers a narrower question than the model readout. A clean retargeting lift test, for example, can be useful evidence without being a clean anchor for the full paid social contribution curve.

Measurement desk comparing a marketing mix model output, lift test packet, calendar window, and audience sheet before a budget decision. — Calibration starts by checking whether the model answer and the experiment answer describe the same decision. Channel, audience, outcome, time window, and uncertainty need to line up before evidence becomes an anchor.

Start with the calibration job

Teams often say an MMM is calibrated without saying what the calibration is supposed to do. Name the job before accepting the evidence.

Calibration job	Evidence needed	Weak substitute
Anchor one channel's incremental effect.	A credible test for the same channel, audience, market, outcome, and time window the model estimates.	A platform lift number from a different audience, season, or outcome treated as a universal truth.
Constrain a response curve.	Tests or historical spend variation near the current and proposed spend range.	A single average return applied to every future budget level.
Set priors for a sparse channel.	Transparent prior source, relevance test, uncertainty range, and sensitivity run.	A benchmark inserted without showing how much it drives the answer.
Validate channel ranking.	Multiple calibration points or sensitivity checks showing whether rank survives uncertainty.	One favorable experiment used to justify a full channel order.
Plan the next test.	A model uncertainty that maps to a practical audience, market, outcome, and detectable effect.	A generic plan to run a test after the budget decision has already been made.

Calibration packet

Ask for this packet before debating whether the model has been validated. The point is to make the calibration evidence auditable.

Six evidence packet tiles for source, outcome match, population match, time window, uncertainty, and model influence. — A usable calibration packet keeps the evidence trail visible. The test source, estimand, population, window, uncertainty, and model influence should be inspectable without asking the analyst to reconstruct the work from memory.

Evidence source

Test owner, method, field dates, channel, media tactic, geography, audience, creative mix, delivery quality, and whether the readout was completed before model fitting.

Estimand match

The effect the evidence estimates and the effect the MMM estimates. Check whether both use incremental revenue, conversions, qualified leads, store visits, awareness, or another outcome.

Unit and population

User, household, account, market, store, region, or time-series unit, plus the eligible population. A user-level test may not cleanly calibrate a market-level model without careful translation.

Window alignment

Exposure window, outcome window, conversion lag, carryover, seasonality, and whether the MMM's adstock assumptions line up with the experiment's readout period.

Uncertainty

Interval, minimum detectable effect, sample base, leakage risk, and whether the model uses the calibration point as a tight anchor or a broad prior.

Influence check

Model results with and without the calibration evidence, plus sensitivity to reasonable alternative priors or calibration weights.

Influence and sensitivity check

A calibration point can look authoritative because it is close to an experiment. That does not mean it should be allowed to steer the whole model. Ask the analyst to show a with-calibration readout, a loosened-prior readout, and a no-calibration readout for any channel recommendation that changes the budget.

If the same budget move survives those views, the calibration evidence is probably improving confidence. If the decision flips, the readout should say the recommendation depends on a calibration assumption and identify the next test needed before the team acts.

Side-by-side model readout paths showing a calibration anchor included and loosened before a cautious decision arrow. — The influence check asks how much the calibration point changes the answer. A recommendation that depends on one tight anchor needs narrower wording and a next-test plan, not a confident budget claim.

What changes in sensitivity	Interpretation	Decision rule
Channel rank and budget move stay stable.	The calibration point improves confidence without carrying the whole recommendation.	Use the evidence as an anchor, but still report the uncertainty range.
Rank stays stable, but return interval widens.	The direction is useful, but the amount of spend change is less certain.	Use a smaller budget move or a staged test instead of a full reallocation.
Rank flips when the prior is loosened.	The decision depends on the calibration assumption.	Downgrade the claim to directional and name the experiment needed next.
Only a narrow tactic supports the channel-wide result.	The evidence estimates a smaller population than the model contribution.	Anchor the tactic if appropriate, but do not generalize to the full channel.
External benchmark overwhelms internal history.	The prior is filling a data gap rather than validating the model.	Treat it as a broad plausibility range until internal evidence exists.

Evidence match matrix

Evidence type	Strong fit for MMM calibration	Downgrade when
Randomized conversion lift test	Assignment was protected, outcome source matches the MMM outcome, and the tested population is close to the modeled population.	Control leakage, post-assignment filtering, narrow retargeting eligibility, or a proxy outcome makes the test estimate a different effect.
Geo or store lift test	Treatment and control markets had similar pre-period trends, logged operating changes, and an outcome that maps to the model grain.	Markets were chosen after results were visible, local shocks were unlogged, or the test window is too short for the modeled carryover.
Matched-market comparison	Matching rule was locked before launch and sensitivity checks show the estimate is not driven by one convenient control.	The comparison is mainly size-matched while trend, seasonality, distribution, or competitor pressure differ.
Brand lift study	The model includes a brand or demand proxy and the survey design has balanced exposed and control respondents.	Survey lift is used to calibrate sales, revenue, or profit without evidence connecting the perception outcome to business impact.
External benchmark	The benchmark is used as a wide prior, with source, category, channel, outcome, and uncertainty visible.	It replaces internal evidence or forces a channel contribution because the model is otherwise unstable.

Red flags

The calibration section says "validated by lift tests" but does not name the tests, dates, audiences, outcomes, intervals, or model influence.
A narrow platform test calibrates a broad channel that includes prospecting, retargeting, brand search, partner media, or different creative.
A short-window experiment is used to confirm a long-run response curve without showing lag or carryover sensitivity.
A brand metric is used to anchor sales contribution without explaining the bridge from perception to business outcome.
The model recommendation changes materially when calibration priors are loosened, but the readout still presents one firm budget answer.

Calibration decision board sorting evidence into usable anchor, directional prior, and not ready for calibration lanes. — The final decision is not simply calibrated or uncalibrated. Strong matching evidence can anchor the model, partial evidence should become a directional prior, and mismatched evidence should trigger a new test instead of a stronger claim.

How to write the calibration note

A useful readout makes the calibration note short, specific, and bounded. It should say what evidence was used, how it entered the model, how much it changed the answer, and where it should not be generalized.

Evidence condition	Careful wording	Wording to avoid
Relevant test with clean assignment and aligned outcome.	The model is calibrated to experimental evidence for this channel, population, and outcome range.	The model is proven correct.
Relevant but noisy test.	The test informs the prior, but uncertainty remains wide enough to limit budget-change confidence.	The test validates the channel return.
Different audience or outcome.	The evidence is directional because it estimates a related but different effect.	The same lift applies to the modeled channel.
External benchmark only.	The benchmark supplies a broad plausibility range until internal experiments are available.	The benchmark confirms expected performance.
Conflicting calibration points.	The model should show sensitivity and identify the next testable uncertainty.	The average of conflicting tests settles the question.

Meeting questions

Which calibration points were used, and which were rejected?
Do the calibration points estimate the same outcome, population, and time window as the model?
How does the model result change when each calibration point is removed?
Are calibration points treated as tight constraints or broad priors?
Which channel recommendation depends most on calibration assumptions?
What experiment would most reduce uncertainty before the next budget decision?

Pair with

Use this checklist with the MMM causal validity checklist before accepting model contribution claims and the MMM readout QA checklist before budget allocation. When calibration evidence comes from experiments, pair it with the randomized lift test readout checklist, geo lift test design checklist, and comparison market and holdout planning guide. Use the source library for official references on MMM, outcomes, attention, and measurement quality.

Keep reading

Choose the next guide

After checking calibration evidence, move into model readout QA, uncertainty language, or the next experiment design before the model becomes a budget argument.

Readout QATest the budget claimCheck whether contribution, response curves, priors, calibration, and sensitivity can support the recommended spend move. UncertaintyKeep intervals in the decisionTurn noisy estimates, overlapping channel effects, and fragile rankings into bounded readout language. Next experimentWrite the test planDefine assignment, outcome, effect size, leakage checks, and readout rules for the uncertainty the model cannot resolve.