Evaluation desk
Evaluate the evidence before you trust the frame.
Editors, analysts, and buyers often face the same problem in different language: a source, vendor, dashboard, or report is asking for confidence. This desk organizes Measurement Press's checklists into a practical workflow for deciding how much confidence the evidence deserves.
Use it when a story makes a strong public claim, a vendor presents a measurement result, a research report becomes budget evidence, or a dashboard is being treated as proof. The goal is not automatic skepticism. The goal is confidence that matches the source quality, comparison class, and causal design.
Evaluation routes
Check the source trail
Use when a headline, chart, report, or quote needs the original record, denominator, and missing comparison made visible.
Calibrate confidence
Use when the evidence is useful but the sentence needs a narrower verb, clearer base, or visible uncertainty.
Separate credit from lift
Use when a dashboard, attribution readout, model output, or matched conversion report is being treated as budget evidence.
Leave with a decision
Use when a team needs one place for notes, confidence scoring, caveats, owner, and the next evidence request.
When the evidence is an example
A vivid case can help readers understand a claim, but it should not quietly do the work of a denominator, sample, trend, or counterfactual. Use this routing pass when a story, report, presentation, or campaign recap leans on one example more heavily than the disclosed evidence can support.
| Example is doing this job | First question | Use first | Then check |
|---|---|---|---|
| Illustrating what can happen in one setting. | Is the page clear that this is an illustration, not a prevalence claim? | Anecdote and exemplar checklist | denominator examples and disconfirming evidence |
| Standing in for how common something is. | What population, record universe, sample base, or opportunity base would show frequency? | Denominator framing examples | public records checklist or survey checklist |
| Carrying a claim through one source's experience or authority. | Is access, expertise, direct experience, or incentive being mistaken for independent confirmation? | Source role and incentive map | source triangulation and quote weight |
| Suggesting a trend or sequence from a recent case. | Does the evidence show change over time, or only an event placed near a conclusion? | Before-and-after trend checklist | timeline checklist and causal claim protocol |
| Presenting one campaign, package, or vendor case result as repeatable impact. | Was the comparison chosen before results were visible, and does the outcome match the decision? | Case-study generalizability checklist | campaign baseline comparison, campaign readout QA, and outcome quality scorecard |
Choose the workflow
| Use case | First check | Then inspect | Best Baseline page |
|---|---|---|---|
| Editing or sharing a news claim | Headline scope, verb choice, and implied cause. | Source proximity, denominator, rebuttal standard, and missing comparison. | Headline and source-mix framing checklist |
| Checking quote and response balance | Which quote or record carries the central claim. | Quote roles, evidence parity, response timing, placement, length, and verb choice. | Quote weight and response standard checklist |
| Auditing a public media frame | The central claim and the comparison made easy for the reader. | Evidence trail, base rate, source roles, uncertainty, and disconfirming context. | Media claim audit worksheet |
| Learning the audit pattern | One neutral claim that sounds stronger than the visible source trail. | Source triangulation, denominator choice, comparison class, and final claim wording. | Worked media claim example |
| Checking an anecdote or exemplar claim | The sentence the vivid case, quote, example, or campaign result is being used to prove. | Selection path, denominator, comparison, counterexample, source role, and whether the example supports illustration, prevalence, trend, or cause. | Anecdote and exemplar framing checklist |
| Checking case-study generalizability | The broader rule, forecast, or recommendation one case is being asked to support. | Case selection, eligible universe, baseline, outcome quality, transfer conditions, and what wording remains true if the case does not repeat. | Case study generalizability checklist |
| Triangulating multiple sources | Whether several citations independently confirm the same bounded claim. | Closest record, independent corroboration, counter-source, denominator, comparison class, method disclosure, and final language boundary. | Source triangulation checklist |
| Finding the strongest weakener | The fact, denominator, comparison, counter-source, or method limit that would reduce confidence. | Alternate denominators, missing source roles, fairer comparisons, scope limits, and non-causal explanations. | Disconfirming evidence checklist |
| Mapping source roles and incentives | Which source role is carrying the frame: record owner, participant, expert, sponsor, vendor, data owner, or commentator. | Access, expertise, financial interest, reputational interest, missing counter-source, and whether authority is standing in for evidence. | Source role and incentive map |
| Checking denominator framing | The numerator, rate, percentage, or survey share that carries the story. | Population base, opportunity base, reporting base, time window, composition shift, and whether the denominator matches the decision. | Denominator framing examples |
| Checking average or composition claims | The blended average, rate, share, or index that carries the story. | Subgroup values, subgroup weights, eligibility rules, changing mix, and whether the aggregate moved because the ingredients changed. | Composition mix and average checklist |
| Reading a public-record or denominator claim | Original record, record owner, reporting window, and whether the number is a count, rate, estimate, or revision. | Population base, eligible cases, missing universe, definition changes, and fair comparison class. | Public records and denominator checklist |
| Reading a before-and-after or trend claim | The chart's implied claim, source series, shown window, denominator, and event timing. | Seasonality, concurrent changes, method shifts, start-date choice, revision risk, and whether a counterfactual was chosen before results were visible. | Before-and-after chart and trend claim checklist |
| Reading a timeline or event-order claim | The sequence the story asks readers to accept, and whether chronology is being used as causal evidence. | Timestamped records, prior state, mechanism, lag window, concurrent changes, alternate causes, and comparison timeline. | Timeline and event-order framing checklist |
| Reviewing a causal claim | The verb that says something caused, drove, reduced, prevented, or changed an outcome. | Counterfactual, comparison timing, alternate explanations, denominator, starting level, and the strongest supportable wording. | Causal claim review protocol |
| Reading a poll or survey claim | Population, sample source, field dates, and exact question wording. | Sponsor disclosure, weighting, subgroup bases, uncertainty, and whether the headline compares like with like. | Survey and poll claim checklist |
| Reading a sponsored report or vendor research claim | Sponsor role, report producer, data universe, and intended decision. | Sample selection, method note, denominator, comparison class, limitations, and claim language. | Sponsored research and vendor report checklist |
| Scoring source quality | Whether the article shows the original record or only a summary. | Completeness, specificity, independence, uncertainty, and replicability. | Source quality scorecard |
| Running a meeting review | The claim, decision, owner, and confidence level. | Source type, denominator, comparison, measurement design, next action, and final wording. | Source and vendor evaluation worksheet, claim confidence rubric, and evidence-to-claim language matrix |
| Reviewing a measurement vendor | The decision the result is supposed to support. | Counterfactual, baseline fit, assignment, controls, leakage, incentives, uncertainty, calibration evidence, model sensitivity, signal eligibility, and term boundaries. | Glossary, method selector, next-method guide, baseline comparison checklist, uncertainty interval checklist, MMM readout QA checklist, MMM calibration evidence checklist, holdout planning guide, first-party signal readiness checklist, campaign data-layer spec, campaign readout QA checklist, creative and destination troubleshooting matrix, campaign issue register, renewal memo template, renewal evidence archive, status-window closeout checklist, status-window closeout register, and renewal follow-up tracker |
| Preparing a campaign reporting handoff | Which fields must exist before launch and how the finished report should separate claims. | Reader job, inventory scope, campaign, placement, creative, destination, audience, test-cell, outcome, duplicate, eligibility, change-log, context, traffic-quality, lead-status, matchback, attribution, and incrementality terms. | Contextual package proof checklist, campaign readiness dashboard, contextual campaign brief template, first-party signal readiness checklist, campaign data-layer spec, campaign tagging QA checklist, campaign reporting sample, creative and destination troubleshooting matrix, campaign issue register, renewal memo template, renewal evidence archive, status-window closeout checklist, status-window closeout register, renewal follow-up tracker, and campaign reporting terms glossary |
| Reviewing campaign exposure quality | Whether served impressions had a measurable, viewable, valid opportunity to reach the intended audience. | Measured and unmeasured counts, viewability definition, invalid-traffic filtering, placement context, and frequency concentration. | Viewability and invalid traffic measurement checklist and reach and frequency checklist |
| Reviewing a matched conversion or clean-room report | The identity unit, eligible universe, match method, outcome source, and reporting window. | Match-rate selection, prior intent, duplicate outcomes, privacy thresholds, leakage, and whether a comparison was designed before launch. | Identity matchback measurement checklist, status-window closeout checklist, and status-window closeout register |
| Reviewing attribution windows or conversion lag | The credited outcome, touchpoint rule, lookback window, conversion window, data lag, and reporting cutoff. | Immature cohorts, window shopping, prior intent, channel blind spots, concurrent activity, and whether credited outcomes are being described as lift. | Attribution window and conversion lag checklist, status-window closeout checklist, and status-window closeout register |
| Reviewing an audience targeting result | The behavior, match, model score, or eligibility rule that put people in the audience before exposure. | Pre-existing purchase intent, reachability, loyalty, category buying, retargeting eligibility, and whether the comparison was protected before results were visible. | Audience selection bias checklist |
| Evaluating a lift or brand result | Who could enter the measured group before the result was observed. | Holdout quality, survey recruitment, short windows, proxy outcomes, and readout rules. | Brand lift readout checklist and lift test traps |
The five-question pass
1. What is the claim?Rewrite the headline, vendor result, or report summary as one testable sentence. Remove mood words and decide whether the claim is factual, comparative, predictive, or causal.
2. What is the source?Separate primary records, method-bearing analysis, expert interpretation, interested-party statements, and secondhand summaries. Use the source triangulation checklist when several citations may trace back to the same origin.
3. What is the denominator?Find the population, base rate, time period, and starting level. A percentage change, quote count, lift number, or engagement rate is weak until the denominator is visible.
4. What is the comparison?Name the comparison class the reader is being invited to use. Then ask what comparison would be fairer: prior period, peer group, holdout, matched geography, exposed control, or expected baseline.
5. What would change your mind?List the missing document, counterexample, sensitivity test, confidence interval, pre-period trend, or disconfirming source that would materially reduce confidence. Use the disconfirming evidence checklist when that weakener is not yet specific.
Vendor review prompts
Vendor evaluation should stay practical. A tool does not need to answer every causal question, but it should be clear about what its evidence can and cannot support.
| Vendor claim | Ask for | Watch for |
|---|---|---|
| "We measure incremental impact" | Assignment method, holdout definition, exclusion rules, and uncertainty intervals. | Matched reporting dressed up as randomized evidence. |
| "Our attribution is more accurate" | Validation method, ground truth source, identity limits, channel blind spots, attribution window, conversion lag, and mature-cohort cutoff. | Credit allocation presented as causal lift. |
| "We match offline conversions" | Identity unit, match rate by group, unmatched universe, deduplication rules, outcome source, and comparison design. | Matched outcomes presented as incremental outcomes. |
| "This audience performs better" | Pre-campaign behavior, eligibility criteria, model score, match rate, bid differences, and conversion window. | Selection effects that existed before exposure. |
| "Our inventory is highly viewable" | Measured and unmeasured impression counts, viewability definition, invalid-traffic handling, placement IDs, and frequency distribution. | Exposure-quality claims promoted into sales or brand lift claims. |
| "The model predicts sales" | Out-of-sample fit, calibration against experiments, priors, controls, and sensitivity tests. | Predictive accuracy substituted for causal accuracy. |
| "Brand metrics improved" | Recruitment method, exposed/control balance, sample size, question wording, and field dates. | Survey response bias or memory effects treated as market impact. |
Reader standard
A serious evaluation does not require perfect information. It requires visible limits. Strong pages and strong vendors make the evidence trail easier to inspect, separate measurement from interpretation, and keep claims within the supportable comparison.