Incrementality

Geo lift test design checklist

Published July 3, 2026. Updated July 3, 2026. Status: evergreen source page.

Geo lift tests are useful when advertising, retail distribution, pricing, or operations change at a market level. They can also create confident-looking results from weak comparisons if markets are chosen because they are convenient instead of comparable.

The goal is not to make every market identical. The goal is to show that the treatment markets and comparison markets would have moved similarly if the campaign had not changed.

Planning-board illustration showing paired geo test markets, similar pre-period trend lines, a protected holdout region, and blocked leakage paths. — A credible geo design is visible before launch: candidate markets are paired, pre-period movement is checked, the control region is protected, and spillover paths are named before the readout can favor the campaign.

Geo test routes

Choose the weak design field before launch.

Use the route that matches the unresolved part of the test: market comparison, pre-launch brief, power, leakage, baseline language, or uncertainty.

Market matchPick the comparison before resultsUse when candidate markets, holdouts, matching rules, and downgrade triggers need to be set before launch. Test briefLock the decision designUse when assignment, counterfactual, outcome, exclusion policy, and readout rule need one plan. Power checkConfirm the action thresholdUse when market count, base rate, variance, and decision risk need review before spend moves. Leakage reviewProtect control marketsUse when media, sales outreach, reseller demand, or household paths may reach the comparison group. Baseline claimName what the lift compares againstUse when a readout depends on prior periods, matched markets, holdouts, or modeled demand. UncertaintyBound the budget conclusionUse when intervals, thresholds, noisy slices, or decision language need review before action.

When a geo test fits

Good fit	Weak fit	Design implication
Media can be bought or withheld by market.	The campaign spills heavily across market borders.	Use larger market groups, exclude border areas, or choose another design.
Outcomes are observed consistently by location.	Sales, leads, or visits are missing for some regions.	Fix measurement coverage before assigning treatment.
Enough pre-period history exists to check trend similarity.	The product, price, or tracking system just changed.	Delay the test or treat the result as directional only.
Markets are not being changed by unrelated launches.	A store rollout, promo, supply issue, or pricing change overlaps the test.	Remove affected markets or pre-register adjustment rules.

Market-selection packet

Before markets are assigned, keep a short packet that lets someone outside the campaign see why the tested geography is a fair read of the decision. The packet should be written before treatment markets are known to be flattering.

Packet field	What to record	Why it protects the test
Decision scope	The budget, region, package, channel, or rollout choice the result can change.	Prevents a local test from being used for a broader national claim without evidence.
Candidate pool	All markets considered, the inclusion rule, and markets removed before assignment.	Shows whether the comparison was selected from a fair universe rather than after seeing a convenient pattern.
Balance snapshot	Pre-period outcome level, trend, volatility, spend, stores, distribution, and seasonality by candidate market.	Lets reviewers see whether treatment and control would plausibly have moved together without the campaign.
Local shock log	Promotions, pricing changes, stock issues, weather, events, competitor launches, and tracking changes by market.	Separates media effect from operating changes that can shift local demand.
Exposure boundary	How media, sales outreach, retail activity, and audience extension will be kept inside treatment markets.	Protects control markets from hidden treatment that weakens the counterfactual.
Readout downgrade rule	The conditions that would make the result directional, inconclusive, or not usable for the decision.	Keeps the conclusion from being rewritten around a positive point estimate.

The minimum design brief

Before spend moves, write a short brief that can survive a skeptical readout. It should state the business decision, the test markets, the control markets, the treatment period, the primary outcome, the expected lag, and the rule for calling the result useful.

Decision

Name the decision the result will change. A test for national budget expansion needs a different readout than a test for a local launch playbook.

Estimand

State the lift being estimated: incremental sales, leads, visits, subscriptions, qualified accounts, or another primary outcome during a defined window.

Assignment

Explain why each market is in treatment or control. Avoid letting the highest-opportunity markets become treatment simply because the campaign team wants them most.

Guardrails

List changes that would invalidate or downgrade the readout: inventory breaks, channel outages, price changes, competitor shocks, tracking changes, or major weather disruptions.

Pre-period checks

Check	What to inspect	Red flag
Level balance	Average outcomes, spend, customer base, stores, distribution, and seasonality.	Treatment markets are much larger, richer, or more mature than controls.
Trend balance	Weekly or daily movement before the campaign.	Treatment is already accelerating faster before media changes.
Volatility	Noise in the primary outcome and any extreme weeks.	A few markets swing enough to dominate the readout.
Covariates	Price, promotions, stock, distribution, local events, holidays, and competitor pressure.	Important drivers are only tracked after the test starts.
Placebo timing	Apply the proposed method to an earlier period with no treatment.	The method finds lift where no campaign change happened.

Geo lift seasonality review board comparing treatment and control markets with seasonal trend differences before lift language is approved. — Seasonality is not a footnote in geo testing. If treatment markets were already moving differently before launch, the result may still be useful, but the conclusion must name the imbalance and downgrade the causal claim.

Geo launch readiness score

Use this score before the test starts, not after the result is known. A simple rule works better than a long debate: green items can support lift language, yellow items require directional wording, and red items should stop the test or move the decision to a weaker evidence standard.

Design field	Green	Yellow	Red
Market pool	Candidate markets are listed before assignment with clear inclusion and exclusion rules.	The pool is known, but some removals rely on judgment that needs a written note.	Treatment markets are chosen first and controls are found later.
Trend similarity	Treatment and control markets show similar pre-period levels, slopes, and volatility for the primary outcome.	One balance field is weak, but the difference is stable and documented before launch.	Treatment is already rising faster, or the control series is too noisy to anchor the readout.
Local shock log	Pricing, distribution, promotions, stock, tracking, and local events are tracked by market before the test.	Known shocks are tracked, but smaller operating changes may be missed.	Major business changes can happen by market without appearing in the test record.
Exposure separation	Media, audience extension, sales outreach, and retail activation stay inside treatment boundaries.	Some spillover is plausible, but suppression checks and border exclusions are planned.	Control markets can receive material treatment through another buying or sales path.
Outcome maturity	The outcome source, lag window, duplicate handling, and readout date are locked before launch.	The source is stable, but late-arriving outcomes require a second bounded readout.	The team can choose the outcome window after seeing which one looks favorable.

Analysis choices to lock early

A credible geo readout is usually won before the campaign starts. Lock the outcome, exclusion rules, matching approach, covariates, readout date, and uncertainty treatment before anyone can see whether the answer looks favorable.

Choose one primary outcome and keep secondary metrics clearly labeled.
Use pre-period performance to create controls, but do not tune the control group after seeing test-period lift.
Show absolute lift, percentage lift, confidence or credible intervals, and the implied business value.
Report market-level variation instead of hiding it behind one average.
Run sensitivity checks with obvious outliers removed and explain whether the conclusion changes.
Separate media effect from distribution, pricing, promotion, and product availability changes.

Worked downgrade example

A regional advertiser wants to test a new paid-media package in six metro areas and compare it with six untreated areas. The first pass looks usable: the markets are similar in population and store count, the campaign can be bought by geography, and the primary outcome is weekly qualified leads.

The readiness score changes the conclusion. Two treatment metros were already improving faster during the pre-period, one control metro will receive a separate sales push, and the lead system has a two-week lag that is not included in the original readout date. The test can still teach something, but it should not be allowed to produce a clean causal headline.

Finding before launch	Decision	Allowed readout language
Treatment trend is stronger in two of six paired markets.	Keep the markets only if the analysis plan names the imbalance and runs a sensitivity readout without those pairs.	"Observed lift is sensitive to market selection."
One control market will receive sales outreach during the treatment window.	Remove it, replace it before launch under the original matching rule, or downgrade the design.	"The comparison is partially contaminated."
Qualified leads mature over two weeks, but the original readout is after seven days.	Set the primary readout after the maturity window and mark the early result as operational only.	"Early lead volume is not the final lift estimate."

Readout language that stays honest

Weak phrasing	Stronger phrasing
The campaign drove a 12% sales lift.	In the tested markets, observed sales were 12% above the matched comparison estimate during the readout window.
The test proves this channel works nationally.	The test supports expansion if future markets have similar demand, media delivery, and operating conditions.
The control group was similar in size.	The control group had similar pre-period levels, trends, volatility, and tracked business drivers.
The result was positive, so the test passed.	The result cleared the pre-set decision threshold after uncertainty, costs, and operational constraints were considered.

Takeaway

A geo lift test is a comparison discipline, not just a map with shaded regions. The central question is whether the control markets make a believable estimate of what would have happened anyway.

Keep reading

Choose the next measurement check

Move from this page into method choice, baseline review, and uncertainty language before the evidence is overread.

Method choicePick the evidence designChoose between MMM, lift tests, geo tests, brand studies, attention metrics, and attribution reports. Baseline checkName the comparisonCheck prior-period, matched, holdout, and modeled baselines before observed response becomes lift. UncertaintyBound the conclusionRead intervals, thresholds, noisy slices, and decision language before calling a result meaningful.