Marketing mix modeling

MMM calibration evidence checklist

Calibration is the bridge between a marketing mix model and real-world experiments. It can make a model more useful, but only when the evidence being used as an anchor actually matches the effect the model is asked to estimate.

Use this checklist before a lift test, geo experiment, holdout, brand study, or external benchmark is used to tune a model, validate a channel contribution, or defend a budget recommendation. A calibration point is not a trophy. It is evidence with a unit, population, outcome, window, uncertainty, and decision boundary.

Advertisement In-article measurement unit.

Start with the calibration job

Teams often say an MMM is calibrated without saying what the calibration is supposed to do. Name the job before accepting the evidence.

Calibration jobEvidence neededWeak substitute
Anchor one channel's incremental effect.A credible test for the same channel, audience, market, outcome, and time window the model estimates.A platform lift number from a different audience, season, or outcome treated as a universal truth.
Constrain a response curve.Tests or historical spend variation near the current and proposed spend range.A single average return applied to every future budget level.
Set priors for a sparse channel.Transparent prior source, relevance test, uncertainty range, and sensitivity run.A benchmark inserted without showing how much it drives the answer.
Validate channel ranking.Multiple calibration points or sensitivity checks showing whether rank survives uncertainty.One favorable experiment used to justify a full channel order.
Plan the next test.A model uncertainty that maps to a practical audience, market, outcome, and detectable effect.A generic plan to run a test after the budget decision has already been made.

Calibration packet

Ask for this packet before debating whether the model has been validated. The point is to make the calibration evidence auditable.

Evidence source

Test owner, method, field dates, channel, media tactic, geography, audience, creative mix, delivery quality, and whether the readout was completed before model fitting.

Estimand match

The effect the evidence estimates and the effect the MMM estimates. Check whether both use incremental revenue, conversions, qualified leads, store visits, awareness, or another outcome.

Unit and population

User, household, account, market, store, region, or time-series unit, plus the eligible population. A user-level test may not cleanly calibrate a market-level model without careful translation.

Window alignment

Exposure window, outcome window, conversion lag, carryover, seasonality, and whether the MMM's adstock assumptions line up with the experiment's readout period.

Uncertainty

Interval, minimum detectable effect, sample base, leakage risk, and whether the model uses the calibration point as a tight anchor or a broad prior.

Influence check

Model results with and without the calibration evidence, plus sensitivity to reasonable alternative priors or calibration weights.

Evidence match matrix

Evidence typeStrong fit for MMM calibrationDowngrade when
Randomized conversion lift testAssignment was protected, outcome source matches the MMM outcome, and the tested population is close to the modeled population.Control leakage, post-assignment filtering, narrow retargeting eligibility, or a proxy outcome makes the test estimate a different effect.
Geo or store lift testTreatment and control markets had similar pre-period trends, logged operating changes, and an outcome that maps to the model grain.Markets were chosen after results were visible, local shocks were unlogged, or the test window is too short for the modeled carryover.
Matched-market comparisonMatching rule was locked before launch and sensitivity checks show the estimate is not driven by one convenient control.The comparison is mainly size-matched while trend, seasonality, distribution, or competitor pressure differ.
Brand lift studyThe model includes a brand or demand proxy and the survey design has balanced exposed and control respondents.Survey lift is used to calibrate sales, revenue, or profit without evidence connecting the perception outcome to business impact.
External benchmarkThe benchmark is used as a wide prior, with source, category, channel, outcome, and uncertainty visible.It replaces internal evidence or forces a channel contribution because the model is otherwise unstable.
Advertisement Lower in-article unit.

Red flags

  • The calibration section says "validated by lift tests" but does not name the tests, dates, audiences, outcomes, intervals, or model influence.
  • A narrow platform test calibrates a broad channel that includes prospecting, retargeting, brand search, partner media, or different creative.
  • A short-window experiment is used to confirm a long-run response curve without showing lag or carryover sensitivity.
  • A brand metric is used to anchor sales contribution without explaining the bridge from perception to business outcome.
  • The model recommendation changes materially when calibration priors are loosened, but the readout still presents one firm budget answer.

How to write the calibration note

A useful readout makes the calibration note short, specific, and bounded. It should say what evidence was used, how it entered the model, how much it changed the answer, and where it should not be generalized.

Evidence conditionCareful wordingWording to avoid
Relevant test with clean assignment and aligned outcome.The model is calibrated to experimental evidence for this channel, population, and outcome range.The model is proven correct.
Relevant but noisy test.The test informs the prior, but uncertainty remains wide enough to limit budget-change confidence.The test validates the channel return.
Different audience or outcome.The evidence is directional because it estimates a related but different effect.The same lift applies to the modeled channel.
External benchmark only.The benchmark supplies a broad plausibility range until internal experiments are available.The benchmark confirms expected performance.
Conflicting calibration points.The model should show sensitivity and identify the next testable uncertainty.The average of conflicting tests settles the question.

Meeting questions

  • Which calibration points were used, and which were rejected?
  • Do the calibration points estimate the same outcome, population, and time window as the model?
  • How does the model result change when each calibration point is removed?
  • Are calibration points treated as tight constraints or broad priors?
  • Which channel recommendation depends most on calibration assumptions?
  • What experiment would most reduce uncertainty before the next budget decision?

Pair with

Use this checklist with the MMM causal validity checklist before accepting model contribution claims and the MMM readout QA checklist before budget allocation. When calibration evidence comes from experiments, pair it with the randomized lift test readout checklist, geo lift test design checklist, and comparison market and holdout planning guide. Use the source library for official references on MMM, outcomes, attention, and measurement quality.