Incrementality

Incrementality test plan template

Incrementality work gets weaker when the test is treated as a reporting feature instead of a decision design. A good plan is written before results are visible. It states what decision is being made, what counterfactual is needed, how assignment works, and what result would change behavior.

This template is for marketing teams comparing a campaign, channel, audience, offer, creative, or retail-media package against a credible alternative. It is deliberately plain: if a result cannot pass these checks, it should not be used as a confident budget signal.

Advertisement In-article programmatic unit.

The one-page plan

FieldWhat to write before launchFailure mode it prevents
DecisionThe exact action the result will inform: increase budget, pause spend, change audience, approve a vendor, or redesign the offer.Running a test that produces an interesting number but no operational choice.
EstimandThe causal effect being estimated: incremental conversions, revenue, margin, retained customers, store visits, or another outcome.Reporting a metric that does not match the business decision.
CounterfactualWhat would have happened without the tested treatment: no ad, business-as-usual media, different creative, different offer, or delayed launch.Comparing treatment to an unrealistic baseline.
Unit of assignmentUser, household, store, market, ZIP code, account, device, or another unit that can be kept meaningfully separate.Letting exposure spill across treatment and control groups.
Eligibility rulesWho can enter the test, when they enter, and which prior customers, employees, outliers, or geographies are excluded.Changing the population after seeing early performance.
Primary outcomeOne main outcome with a fixed window and source of truth.Searching across metrics until one looks favorable.
Minimum detectable effectThe smallest effect worth acting on, plus expected sample size and variance assumptions.Overreading a noisy test that could not detect a useful change.
Readout ruleThe threshold for action: interval excludes zero, lift exceeds margin hurdle, result clears payback, or test remains inconclusive.Treating every positive point estimate as a win.

Design choices

Randomized user holdout

Best when exposure can be controlled at the user or household level and the outcome can be measured consistently for both groups. The main risks are identity gaps, cross-device leakage, suppression failures, and audience changes during the test.

Geo or store test

Best when treatment is bought or executed by market, store, or region. Markets should be balanced on pre-period outcome trends, seasonality, media history, price, distribution, and competitive context. A geo test should not rely on one unusually strong control market.

Matched-market design

Useful when randomization is not available, but the match must be made before results are visible. Strong matching uses pre-period trend similarity, not only market size or demographic resemblance.

Switchback design

Useful when treatment can alternate across time windows, such as bidding rules or merchandising conditions. It works only when carryover is limited and time-period shocks are handled honestly.

Leakage and compliance checks

  • Control users or markets should not receive the tested exposure through another campaign, reseller, audience extension, or shared household route.
  • Treatment delivery should be audited against the planned audience, geography, dates, frequency, and budget.
  • Other large changes during the test should be logged: pricing, promotions, inventory, sales coverage, distribution, site outages, and competitor events.
  • Outcome capture should be checked equally for treatment and control. A conversion match rate problem can become a fake lift problem.
  • Any excluded observations should be named with counts and reasons before final analysis.
Advertisement Lower in-article unit.

Readout table

Result patternReasonable interpretationDecision posture
Positive effect, narrow interval, clears profit hurdleThe test supports scaling within the tested population and conditions.Scale gradually and monitor whether the effect decays.
Positive effect, wide intervalThe direction is encouraging, but the estimate is too imprecise for a confident decision.Repeat with more power or lower the decision stakes.
Near zero with narrow intervalThe tested treatment likely did not create a commercially meaningful effect.Pause or redesign unless there is a separate strategic reason.
Mixed segmentsThe average hides heterogeneous effects across audiences, geographies, or product groups.Retest the strongest segment with a pre-stated plan.
Broken compliance or leakageThe contrast no longer estimates the intended counterfactual.Do not treat the result as causal evidence.

Language to avoid

Weak claimCleaner claim
"The campaign drove a 14% lift.""In this test population and window, the estimated incremental lift was 14%, with the reported interval and caveats below."
"The channel works.""The tested setup produced incremental value under these conditions; other audiences, bids, seasons, and creatives need separate evidence."
"The test was inconclusive, but directionally positive.""The test did not reach the pre-stated decision threshold. The point estimate alone should not justify scale."
"Platform attribution confirms the test.""Attribution can explain observed paths, but the holdout contrast is the causal evidence to inspect."

Takeaway

An incrementality test is strongest when its humility is designed in. Define the decision, protect the comparison, pre-state the readout, and make uncertainty visible. The goal is not to produce a flattering number. The goal is to learn what the next dollar is likely to change.