Evaluation desk

Evaluate the evidence before you trust the frame.

Editors, analysts, and buyers often face the same problem in different language: a source, vendor, dashboard, or report is asking for confidence. This desk organizes Measurement Press's checklists into a practical workflow for deciding how much confidence the evidence deserves.

Use it when a story makes a strong public claim, a vendor presents a measurement result, a research report becomes budget evidence, or a dashboard is being treated as proof. The goal is not automatic skepticism. The goal is confidence that matches the source quality, comparison class, and causal design.

Start here: choose the route that matches the decision in front of you, then use the worksheet or checklist that forces the source, denominator, comparison, and claim language into view. Use the topic taxonomy when you need to choose a different desk first.

Evaluation routes

Public claim

Check the source trail

Use when a headline, chart, report, or quote needs the original record, denominator, and missing comparison made visible.

Claim wording

Calibrate confidence

Use when the evidence is useful but the sentence needs a narrower verb, clearer base, or visible uncertainty.

Vendor result

Separate credit from lift

Use when a dashboard, attribution readout, model output, or matched conversion report is being treated as budget evidence.

Meeting review

Leave with a decision

Use when a team needs one place for notes, confidence scoring, caveats, owner, and the next evidence request.

When the evidence is an example

A vivid case can help readers understand a claim, but it should not quietly do the work of a denominator, sample, trend, or counterfactual. Use this routing pass when a story, report, presentation, or campaign recap leans on one example more heavily than the disclosed evidence can support.

Example is doing this jobFirst questionUse firstThen check
Illustrating what can happen in one setting.Is the page clear that this is an illustration, not a prevalence claim?Anecdote and exemplar checklistdenominator examples and disconfirming evidence
Standing in for how common something is.What population, record universe, sample base, or opportunity base would show frequency?Denominator framing examplespublic records checklist or survey checklist
Carrying a claim through one source's experience or authority.Is access, expertise, direct experience, or incentive being mistaken for independent confirmation?Source role and incentive mapsource triangulation and quote weight
Suggesting a trend or sequence from a recent case.Does the evidence show change over time, or only an event placed near a conclusion?Before-and-after trend checklisttimeline checklist and causal claim protocol
Presenting one campaign, package, or vendor case result as repeatable impact.Was the comparison chosen before results were visible, and does the outcome match the decision?Case-study generalizability checklistcampaign baseline comparison, campaign readout QA, and outcome quality scorecard
Advertisement

Choose the workflow

Use caseFirst checkThen inspectBest Baseline page
Editing or sharing a news claimHeadline scope, verb choice, and implied cause.Source proximity, denominator, rebuttal standard, and missing comparison.Headline and source-mix framing checklist
Checking quote and response balanceWhich quote or record carries the central claim.Quote roles, evidence parity, response timing, placement, length, and verb choice.Quote weight and response standard checklist
Auditing a public media frameThe central claim and the comparison made easy for the reader.Evidence trail, base rate, source roles, uncertainty, and disconfirming context.Media claim audit worksheet
Learning the audit patternOne neutral claim that sounds stronger than the visible source trail.Source triangulation, denominator choice, comparison class, and final claim wording.Worked media claim example
Checking an anecdote or exemplar claimThe sentence the vivid case, quote, example, or campaign result is being used to prove.Selection path, denominator, comparison, counterexample, source role, and whether the example supports illustration, prevalence, trend, or cause.Anecdote and exemplar framing checklist
Checking case-study generalizabilityThe broader rule, forecast, or recommendation one case is being asked to support.Case selection, eligible universe, baseline, outcome quality, transfer conditions, and what wording remains true if the case does not repeat.Case study generalizability checklist
Triangulating multiple sourcesWhether several citations independently confirm the same bounded claim.Closest record, independent corroboration, counter-source, denominator, comparison class, method disclosure, and final language boundary.Source triangulation checklist
Finding the strongest weakenerThe fact, denominator, comparison, counter-source, or method limit that would reduce confidence.Alternate denominators, missing source roles, fairer comparisons, scope limits, and non-causal explanations.Disconfirming evidence checklist
Mapping source roles and incentivesWhich source role is carrying the frame: record owner, participant, expert, sponsor, vendor, data owner, or commentator.Access, expertise, financial interest, reputational interest, missing counter-source, and whether authority is standing in for evidence.Source role and incentive map
Checking denominator framingThe numerator, rate, percentage, or survey share that carries the story.Population base, opportunity base, reporting base, time window, composition shift, and whether the denominator matches the decision.Denominator framing examples
Checking average or composition claimsThe blended average, rate, share, or index that carries the story.Subgroup values, subgroup weights, eligibility rules, changing mix, and whether the aggregate moved because the ingredients changed.Composition mix and average checklist
Reading a public-record or denominator claimOriginal record, record owner, reporting window, and whether the number is a count, rate, estimate, or revision.Population base, eligible cases, missing universe, definition changes, and fair comparison class.Public records and denominator checklist
Reading a before-and-after or trend claimThe chart's implied claim, source series, shown window, denominator, and event timing.Seasonality, concurrent changes, method shifts, start-date choice, revision risk, and whether a counterfactual was chosen before results were visible.Before-and-after chart and trend claim checklist
Reading a timeline or event-order claimThe sequence the story asks readers to accept, and whether chronology is being used as causal evidence.Timestamped records, prior state, mechanism, lag window, concurrent changes, alternate causes, and comparison timeline.Timeline and event-order framing checklist
Reviewing a causal claimThe verb that says something caused, drove, reduced, prevented, or changed an outcome.Counterfactual, comparison timing, alternate explanations, denominator, starting level, and the strongest supportable wording.Causal claim review protocol
Reading a poll or survey claimPopulation, sample source, field dates, and exact question wording.Sponsor disclosure, weighting, subgroup bases, uncertainty, and whether the headline compares like with like.Survey and poll claim checklist
Reading a sponsored report or vendor research claimSponsor role, report producer, data universe, and intended decision.Sample selection, method note, denominator, comparison class, limitations, and claim language.Sponsored research and vendor report checklist
Scoring source qualityWhether the article shows the original record or only a summary.Completeness, specificity, independence, uncertainty, and replicability.Source quality scorecard
Running a meeting reviewThe claim, decision, owner, and confidence level.Source type, denominator, comparison, measurement design, next action, and final wording.Source and vendor evaluation worksheet, claim confidence rubric, and evidence-to-claim language matrix
Reviewing a measurement vendorThe decision the result is supposed to support.Counterfactual, baseline fit, assignment, controls, leakage, incentives, uncertainty, calibration evidence, model sensitivity, signal eligibility, and term boundaries.Glossary, method selector, next-method guide, baseline comparison checklist, uncertainty interval checklist, MMM readout QA checklist, MMM calibration evidence checklist, holdout planning guide, first-party signal readiness checklist, campaign data-layer spec, campaign readout QA checklist, creative and destination troubleshooting matrix, campaign issue register, renewal memo template, renewal evidence archive, status-window closeout checklist, status-window closeout register, and renewal follow-up tracker
Preparing a campaign reporting handoffWhich fields must exist before launch and how the finished report should separate claims.Reader job, inventory scope, campaign, placement, creative, destination, audience, test-cell, outcome, duplicate, eligibility, change-log, context, traffic-quality, lead-status, matchback, attribution, and incrementality terms.Contextual package proof checklist, campaign readiness dashboard, contextual campaign brief template, first-party signal readiness checklist, campaign data-layer spec, campaign tagging QA checklist, campaign reporting sample, creative and destination troubleshooting matrix, campaign issue register, renewal memo template, renewal evidence archive, status-window closeout checklist, status-window closeout register, renewal follow-up tracker, and campaign reporting terms glossary
Reviewing campaign exposure qualityWhether served impressions had a measurable, viewable, valid opportunity to reach the intended audience.Measured and unmeasured counts, viewability definition, invalid-traffic filtering, placement context, and frequency concentration.Viewability and invalid traffic measurement checklist and reach and frequency checklist
Reviewing a matched conversion or clean-room reportThe identity unit, eligible universe, match method, outcome source, and reporting window.Match-rate selection, prior intent, duplicate outcomes, privacy thresholds, leakage, and whether a comparison was designed before launch.Identity matchback measurement checklist, status-window closeout checklist, and status-window closeout register
Reviewing attribution windows or conversion lagThe credited outcome, touchpoint rule, lookback window, conversion window, data lag, and reporting cutoff.Immature cohorts, window shopping, prior intent, channel blind spots, concurrent activity, and whether credited outcomes are being described as lift.Attribution window and conversion lag checklist, status-window closeout checklist, and status-window closeout register
Reviewing an audience targeting resultThe behavior, match, model score, or eligibility rule that put people in the audience before exposure.Pre-existing purchase intent, reachability, loyalty, category buying, retargeting eligibility, and whether the comparison was protected before results were visible.Audience selection bias checklist
Evaluating a lift or brand resultWho could enter the measured group before the result was observed.Holdout quality, survey recruitment, short windows, proxy outcomes, and readout rules.Brand lift readout checklist and lift test traps

The five-question pass

1. What is the claim?

Rewrite the headline, vendor result, or report summary as one testable sentence. Remove mood words and decide whether the claim is factual, comparative, predictive, or causal.

2. What is the source?

Separate primary records, method-bearing analysis, expert interpretation, interested-party statements, and secondhand summaries. Use the source triangulation checklist when several citations may trace back to the same origin.

3. What is the denominator?

Find the population, base rate, time period, and starting level. A percentage change, quote count, lift number, or engagement rate is weak until the denominator is visible.

4. What is the comparison?

Name the comparison class the reader is being invited to use. Then ask what comparison would be fairer: prior period, peer group, holdout, matched geography, exposed control, or expected baseline.

5. What would change your mind?

List the missing document, counterexample, sensitivity test, confidence interval, pre-period trend, or disconfirming source that would materially reduce confidence. Use the disconfirming evidence checklist when that weakener is not yet specific.

Vendor review prompts

Vendor evaluation should stay practical. A tool does not need to answer every causal question, but it should be clear about what its evidence can and cannot support.

Vendor claimAsk forWatch for
"We measure incremental impact"Assignment method, holdout definition, exclusion rules, and uncertainty intervals.Matched reporting dressed up as randomized evidence.
"Our attribution is more accurate"Validation method, ground truth source, identity limits, channel blind spots, attribution window, conversion lag, and mature-cohort cutoff.Credit allocation presented as causal lift.
"We match offline conversions"Identity unit, match rate by group, unmatched universe, deduplication rules, outcome source, and comparison design.Matched outcomes presented as incremental outcomes.
"This audience performs better"Pre-campaign behavior, eligibility criteria, model score, match rate, bid differences, and conversion window.Selection effects that existed before exposure.
"Our inventory is highly viewable"Measured and unmeasured impression counts, viewability definition, invalid-traffic handling, placement IDs, and frequency distribution.Exposure-quality claims promoted into sales or brand lift claims.
"The model predicts sales"Out-of-sample fit, calibration against experiments, priors, controls, and sensitivity tests.Predictive accuracy substituted for causal accuracy.
"Brand metrics improved"Recruitment method, exposed/control balance, sample size, question wording, and field dates.Survey response bias or memory effects treated as market impact.

Reader standard

A serious evaluation does not require perfect information. It requires visible limits. Strong pages and strong vendors make the evidence trail easier to inspect, separate measurement from interpretation, and keep claims within the supportable comparison.