Incrementality

Holdout leakage and suppression QA checklist

A holdout is only useful if it remains meaningfully unexposed. When control users, households, stores, or markets receive the tested treatment through another path, the readout may still look scientific while the contrast quietly weakens.

Use this checklist before launch, during delivery QA, and before the final readout. It is written for campaign teams, analysts, publishers, agencies, and buyers who need to know whether an incrementality result still represents the question it was designed to answer.

Advertisement In-article programmatic unit.

Where leakage enters

Leakage is not one failure. It is any route that lets the control cell receive the treatment, a close substitute for the treatment, or a different observation rule than the exposed cell.

Leakage pathWhat to inspectWhy it changes the readout
Audience suppression missControl IDs, hashed records, device IDs, household IDs, CRM records, or market lists were not excluded from buying, activation, or delivery systems.The control group receives some tested exposure, shrinking or distorting the measured treatment difference.
Identity graph mismatchThe assigned unit differs from the delivery unit: household assignment but device delivery, account assignment but cookie delivery, or market assignment with cross-border reach.Exposure can cross the boundary even when the campaign system appears to obey the holdout rule.
Retargeting or audience extensionControl users enter downstream retargeting pools, lookalike expansion, partner audiences, reseller delivery, or sequential messaging.A clean first exposure rule can be broken by later tactics that were not part of the original test plan.
Other media or sales outreachEmail, search, affiliate, social, field sales, promotions, or account outreach reaches the control cell during the test window.The result may compare two different marketing mixes rather than treatment versus business as usual.
Outcome observation mismatchTreatment and control records have different match rates, lead routing, store reporting, survey recruitment, CRM coverage, or conversion windows.The measured lift may reflect which cell was easier to observe rather than which cell changed behavior.
Post-assignment filteringInactive users, unmatched records, low-delivery markets, stockout locations, or outlier accounts are removed after early results are visible.The comparison can become selected even if the original assignment was randomized.

Pre-launch suppression brief

The suppression brief should be short enough to use, but specific enough that ad operations, analytics, and the buyer can audit the same boundary later.

FieldWrite before launchEvidence to retain
Assigned unitUser, household, account, store, market, ZIP code, device, or another unit, with the reason that unit can be separated.Assignment file, timestamp, eligibility rule, and owner.
Suppression surfaceEvery system that must exclude the control cell: ad server, buying platform, CRM, clean room, email tool, sales list, retail platform, or partner activation.Suppression upload logs, platform receipts, audience counts, and campaign screenshots or exports.
Exposure definitionThe treatment that must be blocked from control: specific campaign, creative theme, offer, media channel, package, sales touch, or discount.Campaign IDs, deal keys, creative IDs, offer IDs, and active dates.
Allowed background activityBusiness-as-usual messages or evergreen media that control units may still receive.Written exception list and reason each exception does not invalidate the decision.
Leakage toleranceThe maximum acceptable control exposure rate, affected market share, or outcome-observation imbalance before the readout is downgraded.Pre-stated threshold, escalation owner, and readout language rule.
QA cadenceLaunch-day check, early delivery check, weekly or mid-flight check, and pre-readout audit.Dated QA notes with counts, differences, unresolved risks, and owner decisions.

In-flight QA checks

Launch-day count reconciliation

Compare eligible treatment and control counts against the activation counts in each platform. A sudden drop in one cell, a mismatched geography count, or a missing suppression receipt should be resolved before delivery ramps.

Control exposure audit

Report impressions, reach, clicks, emails, offers, sales touches, or other treatment events observed in the control cell. The audit should use the same unit as assignment whenever possible.

Overlap with other tactics

Check whether control units are entering retargeting pools, search remarketing lists, lead-nurture flows, loyalty offers, partner audiences, or regional promotions during the test window.

Delivery imbalance

Treatment delivery should be checked against the plan by geography, device, audience, placement, creative, and date. A test with weak treatment exposure may be inconclusive for operational reasons.

Observation parity

Compare match rates, survey recruitment rates, form processing, lead routing, store coverage, CRM sync timing, and outcome lag across cells. Unequal observation can create or hide lift.

Advertisement Lower in-article unit.

Leakage severity score

The goal is not to pretend every test can be perfectly isolated. The goal is to decide whether the readout still deserves causal language or should be downgraded to operational evidence.

SeverityPatternHow to use the result
LowMinor control exposure, symmetric outcome capture, documented cause, and no material change to the treatment-control contrast.Keep causal language bounded to the tested population, but disclose the leakage check and sensitivity read.
ModerateNoticeable control exposure, partial suppression failure, or a material overlap with another tactic that can be estimated.Use directional language, show sensitivity ranges, and avoid confident scaling claims until a cleaner test confirms the result.
HighControl group received substantial treatment or a close substitute, or one cell had meaningfully different outcome capture.Do not use the estimate as causal evidence. Treat the report as delivery and operations learning.
UnknownSuppression logs, exposure audits, or observation parity checks are missing.Downgrade the conclusion because the control condition cannot be verified.

Readout language by QA result

QA findingCareful wordingWording to avoid
Suppression clean, exposure strong, outcome capture balanced.The test estimates incremental impact for this eligible population, treatment, and readout window.The channel caused the same lift everywhere.
Small control leakage, sensitivity still clears the decision threshold.The result remains supportive under the documented leakage sensitivity, but should stay bounded to the tested setup.Leakage was immaterial, so the number is exact.
Moderate leakage or overlapping tactics.The result is directional because the holdout was partially exposed or affected by related activity.The positive lift proves the treatment worked.
Outcome observation differs by cell.The readout cannot separate behavioral lift from differences in matching, routing, survey response, or data capture.The measured conversions prove incremental impact.
Suppression evidence missing.The report describes observed performance under an unverified comparison, not a clean causal estimate.The test design guarantees incrementality.

Questions for the readout call

  • Which systems received the suppression list, and can each system show a dated upload, audience count, or exclusion log?
  • Was the assignment unit the same as the delivery and outcome unit? If not, what cross-unit leakage was expected?
  • How many control units received impressions, emails, offers, calls, retargeting, or related exposure during the window?
  • Were any users, markets, accounts, stores, leads, or outcomes removed after assignment?
  • Did treatment and control have comparable match rates, survey recruitment, lead routing, store coverage, and conversion lag?
  • What leakage threshold would have downgraded the conclusion before the result was visible?

Takeaway

A holdout is not protected by intention. It is protected by visible suppression, exposure audits, stable assignment, and equal outcome capture. When those checks are present, a lift result can support a bounded decision. When they are absent, the most honest readout may be that the campaign delivered, but the counterfactual was not clean enough to prove what changed.