Incrementality

Incrementality test plan template

Published July 3, 2026. Updated July 5, 2026. Status: evergreen source page.

Incrementality work gets weaker when the test is treated as a reporting feature instead of a decision design. A good plan is written before results are visible. It states what decision is being made, what counterfactual is needed, how assignment works, and what result would change behavior.

This template is for marketing teams comparing a campaign, channel, audience, offer, creative, or retail-media package against a credible alternative. It is deliberately plain: if a result cannot pass these checks, it should not be used as a confident budget signal.

Editorial planning desk with a central decision folder connected to evidence cards for target, assignment, audience, outcome, and balance checks. — Start with the decision folder, not the future chart. A useful incrementality plan names the action, the comparison, the assignment unit, the outcome, and the readout rule before any performance number is visible.

The one-page plan

The one-page plan should make the test answerable by someone who did not sit in the kickoff meeting. If the decision, population, comparison, and action threshold are not explicit, the readout will invite a stronger claim than the design can support.

Field	What to write before launch	Failure mode it prevents
Decision	The exact action the result will inform: increase budget, pause spend, change audience, approve a vendor, or redesign the offer.	Running a test that produces an interesting number but no operational choice.
Estimand	The causal effect being estimated: incremental conversions, revenue, margin, retained customers, store visits, or another outcome.	Reporting a metric that does not match the business decision.
Counterfactual	What would have happened without the tested treatment: no ad, business-as-usual media, different creative, different offer, or delayed launch.	Comparing treatment to an unrealistic baseline.
Unit of assignment	User, household, store, market, ZIP code, account, device, or another unit that can be kept meaningfully separate.	Letting exposure spill across treatment and control groups.
Eligibility rules	Who can enter the test, when they enter, and which prior customers, employees, outliers, or geographies are excluded.	Changing the population after seeing early performance.
Primary outcome	One main outcome with a fixed window and source of truth.	Searching across metrics until one looks favorable.
Minimum detectable effect	The smallest effect worth acting on, plus expected sample size and variance assumptions.	Overreading a noisy test that could not detect a useful change.
Readout rule	The threshold for action: interval excludes zero, lift exceeds margin hurdle, result clears payback, or test remains inconclusive.	Treating every positive point estimate as a win.

Design choices

The design choice is really a question about the counterfactual: who or what stands in for the result that would have happened without the tested campaign? The answer determines whether the plan needs user assignment, household assignment, stores, markets, ZIP codes, time windows, or a matched comparison.

Treatment and control lanes separated by a protected barrier, both ending at a shared outcome measurement card. — Treatment and control need the same outcome ruler, but different exposure paths. The protected lane is the counterfactual: the best available view of what would have happened without the tested campaign condition.

Randomized user holdout

Best when exposure can be controlled at the user or household level and the outcome can be measured consistently for both groups. The main risks are identity gaps, cross-device leakage, suppression failures, and audience changes during the test.

Geo or store test

Best when treatment is bought or executed by market, store, or region. Markets should be balanced on pre-period outcome trends, seasonality, media history, price, distribution, and competitive context. A geo test should not rely on one unusually strong control market.

Matched-market design

Useful when randomization is not available, but the match must be made before results are visible. Strong matching uses pre-period trend similarity, not only market size or demographic resemblance.

Switchback design

Useful when treatment can alternate across time windows, such as bidding rules or merchandising conditions. It works only when carryover is limited and time-period shocks are handled honestly.

Leakage and compliance checks

A lift readout is only as credible as the contrast it preserves. Leakage is any route that lets the control group receive treatment-like exposure or lets the treatment group be measured differently from control. Compliance is the proof that the campaign actually ran as planned.

Incrementality test leakage workflow moving messy campaign papers through audience, separation, overlap, timing, and outcome-capture checks into a protected evidence packet. — Run leakage checks before accepting the readout. Audience overlap, suppression breaks, timing mismatches, outcome capture gaps, and unexplained exclusions can turn a causal test into a descriptive campaign report.

Control users or markets should not receive the tested exposure through another campaign, reseller, audience extension, or shared household route.
Treatment delivery should be audited against the planned audience, geography, dates, frequency, and budget.
Other large changes during the test should be logged: pricing, promotions, inventory, sales coverage, distribution, site outages, and competitor events.
Outcome capture should be checked equally for treatment and control. A conversion match rate problem can become a fake lift problem.
Any excluded observations should be named with counts and reasons before final analysis.

Readout table

The readout should map each result pattern to a decision posture. A positive point estimate does not automatically mean scale; an inconclusive result does not automatically mean failure; and a broken contrast should not be rescued with confident causal language.

Decision board showing lift result cards, uncertainty intervals, mixed segments, and compliance failure symbols routed to scale, retest, pause, segment review, and reject-causal-claim outcomes. — Each result pattern gets its own posture. Clear lift can support careful scale, wide intervals call for more power, near-zero lift can protect budget, mixed segments need a new pre-stated plan, and broken compliance should stop causal claims.

Result pattern	Reasonable interpretation	Decision posture
Positive effect, narrow interval, clears profit hurdle	The test supports scaling within the tested population and conditions.	Scale gradually and monitor whether the effect decays.
Positive effect, wide interval	The direction is encouraging, but the estimate is too imprecise for a confident decision.	Repeat with more power or lower the decision stakes.
Near zero with narrow interval	The tested treatment likely did not create a commercially meaningful effect.	Pause or redesign unless there is a separate strategic reason.
Mixed segments	The average hides heterogeneous effects across audiences, geographies, or product groups.	Retest the strongest segment with a pre-stated plan.
Broken compliance or leakage	The contrast no longer estimates the intended counterfactual.	Do not treat the result as causal evidence.

Language to avoid

Weak claim	Cleaner claim
"The campaign drove a 14% lift."	"In this test population and window, the estimated incremental lift was 14%, with the reported interval and caveats below."
"The channel works."	"The tested setup produced incremental value under these conditions; other audiences, bids, seasons, and creatives need separate evidence."
"The test was inconclusive, but directionally positive."	"The test did not reach the pre-stated decision threshold. The point estimate alone should not justify scale."
"Platform attribution confirms the test."	"Attribution can explain observed paths, but the holdout contrast is the causal evidence to inspect."

Takeaway

An incrementality test is strongest when its humility is designed in. Define the decision, protect the comparison, pre-state the readout, and make uncertainty visible. The goal is not to produce a flattering number. The goal is to learn what the next dollar is likely to change.

Topic routes

Choose the next lift-test check before launch.

Use these routes when the test plan needs a clearer power threshold, leakage review, readout rule, or broader method choice before the campaign result becomes decision evidence.

Method deskPlace the test in the method routeCompare incrementality, MMM, attribution, brand lift, attention, and uncertainty before choosing the evidence standard. Power checkConfirm the action thresholdTranslate the business hurdle into sample, base rate, effect size, outcome maturity, and allowed decision language. Leakage reviewProtect the counterfactualCheck suppression, overlap, household spillover, reseller exposure, and observation parity before launch. Readout QAAudit the finished lift resultInspect assignment integrity, base rates, absolute lift, intervals, exclusions, segments, and generalization limits.

Keep reading

Choose the next guide

Move from the test plan into power, leakage, or readout checks before the result becomes budget evidence.

Power checkPlan the detectable effectConfirm the smallest useful effect, sample needs, variance assumptions, and decision threshold. Leakage reviewProtect the comparisonCheck suppression, audience overlap, household spillover, and off-platform exposure before launch. Readout QAAudit the lift resultInspect assignment, exposure compliance, uncertainty, exclusions, and generalization limits.