Campaign measurement

Campaign baseline comparison checklist

A campaign result is only as useful as the comparison underneath it. Before a report says a campaign drove, lifted, improved, or outperformed, the reader needs to know what baseline it is being compared against.

Use this checklist when a campaign readout, vendor deck, dashboard, or renewal memo turns observed performance into a stronger claim. The goal is to separate clean descriptive reporting from directional evidence and from causal lift language.

Advertisement In-article measurement unit.

Baseline strength ladder

Not every decision needs a randomized test, but every decision needs a visible comparison. Start by naming the baseline type before debating the conclusion.

Comparison typeUseful forMain weaknessCareful claim
No explicit baselineConfirming delivery, tracking, and observed response.No way to separate campaign response from normal demand.The campaign delivered measured activity under the stated tracking rule.
Prior-period baselineFast readouts when seasonality, pricing, traffic mix, and tracking are stable.The prior period may be a convenient chart choice rather than a fair counterfactual.Observed outcomes were higher or lower than the selected prior period.
Matched context or audience baselineDirectional comparisons across similar pages, packages, audiences, or placements.Matched groups can still differ in intent, reachability, creative, or destination quality.The result is directionally stronger than the matched comparison, subject to matching limits.
Matched market or geo baselineMarket-level launches where user-level holdouts are impractical.Pre-period trend, local seasonality, and concurrent changes can explain the gap.The treatment markets outperformed the selected comparison markets within this design.
Protected holdoutEstimating incremental effect when assignment and suppression can be defended.Leakage, noncompliance, underpowered cells, or narrow samples can weaken generalization.The campaign produced measured lift for this audience, window, and uncertainty range.
Model baseline calibrated with experimentsBudget planning across channels, time, and business drivers.Model assumptions and calibration quality determine how causal the estimate deserves to sound.Multiple evidence streams support a bounded planning range.

Build the comparison packet

Ask for the packet before accepting the baseline. A weak comparison often looks strong because the report hides the fields that would make it testable.

Decision and claim

The report should state the budget, renewal, creative, package, or test decision the result is meant to inform, plus the exact sentence the evidence is supposed to support.

Primary outcome

Name the preselected outcome, denominator, event rule, conversion window, lead-quality filter, and reporting cutoff. If the primary outcome changed after results were visible, downgrade the claim.

Eligible population

Show who or what could enter the campaign and the comparison: users, households, markets, stores, pages, placements, devices, leads, or matched outcomes.

Pre-period evidence

Show levels and trends before launch. A matched group is weak if it only matches on size while demand, traffic quality, or outcome mix was moving differently.

Concurrent changes

List pricing, promotions, sales coverage, site changes, creative changes, inventory shifts, news-cycle effects, and other campaigns that changed during the readout window.

Exposure isolation

For holdouts or matched groups, show suppression logs, audience overlap, control exposure, channel leakage, and whether excluded users could still see the campaign elsewhere.

Baseline fit test

Use these checks before treating a comparison as fair enough for the decision.

Fit checkAsk forDowngrade when
Level alignmentPre-period outcome levels by treatment and comparison group.The baseline group starts far above or below treatment without adjustment or explanation.
Trend alignmentPre-period slopes across several comparable windows.The groups were already diverging before the campaign started.
Mix alignmentDevice, geography, audience, placement, creative, product, and destination mix.The treatment group received a materially easier or harder mix.
Opportunity alignmentEligibility, reachability, inventory availability, frequency, and bid conditions.The comparison group had less opportunity to be reached or convert.
Tracking alignmentTag status, event rules, attribution windows, match rates, and data-lag cutoffs.One side has better measurement coverage than the other.
Outcome maturityConversion lag, sales follow-up status, brand-study field dates, and late-arriving outcomes.The readout compares mature outcomes with immature outcomes.
Decision thresholdThe minimum effect, cost, quality, or uncertainty threshold that would change the decision.The report celebrates a difference too small or too uncertain to act on.
Advertisement Lower in-article unit.

Language by evidence level

The same table can support very different wording depending on the baseline. Keep the conclusion inside the comparison.

Evidence in handUse this languageAvoid this language
Tracked response onlyThe campaign produced observed visits, leads, or matched outcomes under the stated tracking rule.The campaign created new demand.
Prior-period comparisonObserved outcomes were higher than the selected prior period, with these known context changes.The campaign caused the full before-and-after increase.
Matched baselineThe campaign outperformed the matched comparison on this outcome, with remaining mix and selection limits.The matched result proves incremental lift.
Protected holdout with leakage checksThe campaign produced measured lift within the design, window, sample, and uncertainty range.The result will generalize to every future campaign.
Model plus calibration evidenceThe model and calibration evidence support this planning direction, with stated assumptions and sensitivity.The model has proven exact channel contribution.

Common baseline traps

  • Choosing the weakest prior week, month, or quarter after the campaign result is known.
  • Comparing a high-intent campaign audience with all site visitors, all customers, or all reachable households.
  • Pooling placements, devices, or creative units when one side had better visibility, lower friction, or stronger destination quality.
  • Using attribution-window credit as if it were a comparison group.
  • Calling a matched market valid because it is similar in size while pre-period trends point in different directions.
  • Ignoring late-arriving conversions, lead disqualification, sales follow-up gaps, or data-lag cutoffs.
  • Treating a model baseline as objective without showing controls, calibration evidence, uncertainty, and sensitivity.

Meeting script

  • What is the exact baseline used in this readout?
  • Was the comparison chosen before results were visible?
  • Do treatment and comparison groups align on pre-period level, trend, mix, and tracking quality?
  • What changed during the campaign window that could explain the same outcome?
  • Which claim survives if attributed outcomes are described as observed response instead of lift?
  • Does the next decision need descriptive optimization, a matched baseline, a protected holdout, or a calibrated model?

Pair with

Use this checklist with the campaign readout QA checklist when a finished report needs evidence-level language, the measurement method selector when the question may need a different method, the incrementality test plan template before a causal test is launched, the comparison market and holdout planning guide before selecting markets or controls, the randomized lift test readout checklist when assignment is controlled, the holdout leakage and suppression QA checklist when the control group may have been exposed, the uncertainty interval readout checklist when a point estimate needs a decision range, the outcome quality scorecard before volume becomes a value claim, and the evidence-to-claim language matrix when the final sentence needs safer wording.