Incrementality
Holdout leakage and suppression QA checklist
A holdout is only useful if it remains meaningfully unexposed. When control users, households, stores, or markets receive the tested treatment through another path, the readout may still look scientific while the contrast quietly weakens.
Use this checklist before launch, during delivery QA, and before the final readout. It is written for campaign teams, analysts, publishers, agencies, and buyers who need to know whether an incrementality result still represents the question it was designed to answer.
Where leakage enters
Leakage is not one failure. It is any route that lets the control cell receive the treatment, a close substitute for the treatment, or a different observation rule than the exposed cell.
| Leakage path | What to inspect | Why it changes the readout |
|---|---|---|
| Audience suppression miss | Control IDs, hashed records, device IDs, household IDs, CRM records, or market lists were not excluded from buying, activation, or delivery systems. | The control group receives some tested exposure, shrinking or distorting the measured treatment difference. |
| Identity graph mismatch | The assigned unit differs from the delivery unit: household assignment but device delivery, account assignment but cookie delivery, or market assignment with cross-border reach. | Exposure can cross the boundary even when the campaign system appears to obey the holdout rule. |
| Retargeting or audience extension | Control users enter downstream retargeting pools, lookalike expansion, partner audiences, reseller delivery, or sequential messaging. | A clean first exposure rule can be broken by later tactics that were not part of the original test plan. |
| Other media or sales outreach | Email, search, affiliate, social, field sales, promotions, or account outreach reaches the control cell during the test window. | The result may compare two different marketing mixes rather than treatment versus business as usual. |
| Outcome observation mismatch | Treatment and control records have different match rates, lead routing, store reporting, survey recruitment, CRM coverage, or conversion windows. | The measured lift may reflect which cell was easier to observe rather than which cell changed behavior. |
| Post-assignment filtering | Inactive users, unmatched records, low-delivery markets, stockout locations, or outlier accounts are removed after early results are visible. | The comparison can become selected even if the original assignment was randomized. |
Pre-launch suppression brief
The suppression brief should be short enough to use, but specific enough that ad operations, analytics, and the buyer can audit the same boundary later.
| Field | Write before launch | Evidence to retain |
|---|---|---|
| Assigned unit | User, household, account, store, market, ZIP code, device, or another unit, with the reason that unit can be separated. | Assignment file, timestamp, eligibility rule, and owner. |
| Suppression surface | Every system that must exclude the control cell: ad server, buying platform, CRM, clean room, email tool, sales list, retail platform, or partner activation. | Suppression upload logs, platform receipts, audience counts, and campaign screenshots or exports. |
| Exposure definition | The treatment that must be blocked from control: specific campaign, creative theme, offer, media channel, package, sales touch, or discount. | Campaign IDs, deal keys, creative IDs, offer IDs, and active dates. |
| Allowed background activity | Business-as-usual messages or evergreen media that control units may still receive. | Written exception list and reason each exception does not invalidate the decision. |
| Leakage tolerance | The maximum acceptable control exposure rate, affected market share, or outcome-observation imbalance before the readout is downgraded. | Pre-stated threshold, escalation owner, and readout language rule. |
| QA cadence | Launch-day check, early delivery check, weekly or mid-flight check, and pre-readout audit. | Dated QA notes with counts, differences, unresolved risks, and owner decisions. |
In-flight QA checks
Launch-day count reconciliationCompare eligible treatment and control counts against the activation counts in each platform. A sudden drop in one cell, a mismatched geography count, or a missing suppression receipt should be resolved before delivery ramps.
Control exposure auditReport impressions, reach, clicks, emails, offers, sales touches, or other treatment events observed in the control cell. The audit should use the same unit as assignment whenever possible.
Overlap with other tacticsCheck whether control units are entering retargeting pools, search remarketing lists, lead-nurture flows, loyalty offers, partner audiences, or regional promotions during the test window.
Delivery imbalanceTreatment delivery should be checked against the plan by geography, device, audience, placement, creative, and date. A test with weak treatment exposure may be inconclusive for operational reasons.
Observation parityCompare match rates, survey recruitment rates, form processing, lead routing, store coverage, CRM sync timing, and outcome lag across cells. Unequal observation can create or hide lift.
Leakage severity score
The goal is not to pretend every test can be perfectly isolated. The goal is to decide whether the readout still deserves causal language or should be downgraded to operational evidence.
| Severity | Pattern | How to use the result |
|---|---|---|
| Low | Minor control exposure, symmetric outcome capture, documented cause, and no material change to the treatment-control contrast. | Keep causal language bounded to the tested population, but disclose the leakage check and sensitivity read. |
| Moderate | Noticeable control exposure, partial suppression failure, or a material overlap with another tactic that can be estimated. | Use directional language, show sensitivity ranges, and avoid confident scaling claims until a cleaner test confirms the result. |
| High | Control group received substantial treatment or a close substitute, or one cell had meaningfully different outcome capture. | Do not use the estimate as causal evidence. Treat the report as delivery and operations learning. |
| Unknown | Suppression logs, exposure audits, or observation parity checks are missing. | Downgrade the conclusion because the control condition cannot be verified. |
Readout language by QA result
| QA finding | Careful wording | Wording to avoid |
|---|---|---|
| Suppression clean, exposure strong, outcome capture balanced. | The test estimates incremental impact for this eligible population, treatment, and readout window. | The channel caused the same lift everywhere. |
| Small control leakage, sensitivity still clears the decision threshold. | The result remains supportive under the documented leakage sensitivity, but should stay bounded to the tested setup. | Leakage was immaterial, so the number is exact. |
| Moderate leakage or overlapping tactics. | The result is directional because the holdout was partially exposed or affected by related activity. | The positive lift proves the treatment worked. |
| Outcome observation differs by cell. | The readout cannot separate behavioral lift from differences in matching, routing, survey response, or data capture. | The measured conversions prove incremental impact. |
| Suppression evidence missing. | The report describes observed performance under an unverified comparison, not a clean causal estimate. | The test design guarantees incrementality. |
Questions for the readout call
- Which systems received the suppression list, and can each system show a dated upload, audience count, or exclusion log?
- Was the assignment unit the same as the delivery and outcome unit? If not, what cross-unit leakage was expected?
- How many control units received impressions, emails, offers, calls, retargeting, or related exposure during the window?
- Were any users, markets, accounts, stores, leads, or outcomes removed after assignment?
- Did treatment and control have comparable match rates, survey recruitment, lead routing, store coverage, and conversion lag?
- What leakage threshold would have downgraded the conclusion before the result was visible?
Takeaway
A holdout is not protected by intention. It is protected by visible suppression, exposure audits, stable assignment, and equal outcome capture. When those checks are present, a lift result can support a bounded decision. When they are absent, the most honest readout may be that the campaign delivered, but the counterfactual was not clean enough to prove what changed.