Pitfalls School Systems and EdTechs

When the Data Doesn’t Tell the Story: 5 Pitfalls School Systems and EdTechs Meet in Impact Evaluation (and How to Avoid Them)

By 4:30 p.m., the conference table tells the story of another long day: coffee rings, half-scribbled notes, and a stack of vendor decks claiming “proven impact.” The superintendent wants data for the board packet. The finance chief asks what will justify renewal. Instructional leaders scan spreadsheets, trying to connect logins to learning. Everyone agrees that students come first, but no one agrees on which evidence counts. And that’s a problem that ripples beyond just one department.  

Across districts, the pattern repeats. Choices pile up faster than the proof to support them. Budgets shrink, timelines tighten, and the studies meant to clarify decisions often cloud them instead. The challenge is clarity, not commitment. Many educational impact evaluations promise rigor but fail to capture the real story behind implementation, timing, or context. In fact, there are some common issues with many reports that you have to know about to properly leverage the right tools. 

This article breaks down five common pitfalls we see in independent educational program evaluations and rapid-cycle educational impact evaluations, and how a MomentMN Snapshot Report helps districts and partners get the answers they actually need on schedule.

Pitfall #1: Unclear or Unrealistic Theory of Change

A theory of change is the simple, concrete story of how a product or service leads to student outcomes. It names the activities, the people involved, the dosage, the time horizon, and the expected changes (e.g., “With 30 focused minutes per day, Tier 2 readers gain X points on the midyear benchmark.”). But when this is fuzzy or a patchwork of optimistic claims evaluations wobble.

Why it breaks evaluations. Without a clear theory of change, teams collect the wrong data at the wrong time, ask vague questions, or chase outcomes that aren’t plausible yet. Gugerty and Karlan’s guidance on when not to measure impact argues that poor fit between a program’s maturity and the evaluation design wastes time and obscures learning; right-fit measurement starts with a tight theory of change and a question the design can answer. 

A K-12 example. A district wants to evaluate a new tutoring service. Who is the intended beneficiary EL students, Tier 2 readers, students with chronic absenteeism? How much tutoring constitutes “real exposure” four sessions, eight, twelve? What exactly should change in the outcome data, and by when? If you don’t specify those details, your study may report “some gain” without the clarity that boards and funders need to act.

A quick fix. Map backward. What student outcome must move? Over what time horizon? For which subgroup? What activity, dosage, and conditions create the change? Then align data collection and design to that map. If your theory of change assumes 30 minutes per day, don’t evaluate after six weeks of sporadic 10-minute usage. You need better data than that. 

Where the Snapshot helps. The MomentMN Snapshot begins with the sharp question for whom, by when, under what exposure then builds the comparison to match. You don’t end up chasing vague “impact”; you investigate a falsifiable claim tied to your actual decision and have the evidence to prove it. 

Pitfall #2: Implementation and Context Misalignment

Even well-designed products can miss their mark when the rollout’s “how” is off: dosage, fidelity, staffing, or schedule. Impact studies in education repeatedly flag variation in implementation and insufficient fidelity as major validity threats. Song’s overview of critical issues and common pitfalls in designing impact studies centers precisely on these hazards: when implementation strays, inferences suffer. 

A K-12 example. A math app shows gains in vetted pilots, but once adopted widely, average weekly usage is 10 minutes instead of the required 30. Coaching is thin. Schedules are fragmented by testing windows and assemblies. Your evaluation returns a null finding not because the tool is ineffective in principle, but because the conditions changed.

A quick fix. Ask early: Are teachers trained consistently? Is usage tracked at the student level? Is the new context reasonably similar to pilot conditions? If not, adjust expectations or postpone a summative impact claim. Name implementation as a measured variable, not a footnote. That means capturing dosage, fidelity indicators, and key contextual constraints.

Where the Snapshot helps. The Snapshot leans on the existing district data you already collect—usage, attendance, benchmarks and models the outcomes under real-world variation. You get a defensible read in messy conditions rather than waiting for laboratory-perfect fidelity that never arrives.

Pitfall #3: Treating Monitoring or Usage as Proof of Impact

Monitoring asks, “Did the thing happen?” Impact asks, “Did it change outcomes?” If your “evaluation” is mostly logins, minutes used, or staff training counts, you’re describing activity, not effect. Howard White calls this out bluntly: many reports presented as evaluations are really performance monitoring, not actually reports with the data and evidence you need to make sound decisions. 

Why it matters. Boards don’t ask, “How many times did students click?” They ask, “What improved?” Usage can be a necessary condition, but it’s not sufficient. Emerging commentary on EdTech cautions against treating screen time as learning time; the World Education Blog recently highlighted “usage taken for learning scores” as a widespread mistake, especially with digital tools whose dashboards look persuasive but don’t map to achievement.

A K-12 example. A vendor shows that 90% of students logged in. Great. But did benchmark scores move? Did chronic absenteeism drop? Did referrals fall among the targeted subgroup? Without outcome measures, the story is incomplete, and you won’t be able to take steps that actually benefit your students. 

A quick fix. Define a primary outcome upfront e.g., growth on a district assessment, change in course pass rates, reduction in absenteeism and design the analysis to detect a plausible effect size. Treat usage as a mediator or moderator, not the finish line.

Where the Snapshot helps. The Snapshot centers on outcomes the district already tracks and uses to explain why effects vary (dosage thresholds, timing, subgroup differences). You’re not dazzled by pretty dashboards; you’re weighing actual learning and engagement shifts that will matter when you make crucial decisions. 

Pitfall #4: Poor Comparison Group or Design Limitations

To claim impact, you need a credible counterfactual or evidence for what would have happened without the program. In many districts, randomization isn’t feasible. That’s fine, but then the comparison design carries the weight: matching, controls, and sensitivity checks. When those are weak, your “impact” may reflect selection, not the product.

What the field says. Recent guidance on EdTech impact evaluations flags two linked issues: mis-measuring usage (treating it as learning) and building weak comparison groups that don’t reflect realistic alternatives. The World Education Blog’s three-pitfall piece lays this out concisely for practitioners, providing them with what not to do and how to avoid some common missteps. 

On the methods side, non-experimental designs can work better than many fear—if you do them carefully. Evidence from within-study comparisons and meta-analyses suggests that the bias between quasi-experimental and experimental estimates is often smaller than expected when studies include rich pre-treatment covariates and well-specified models. 

A K-12 example. The district rolls out a tool first to schools with the most motivated principals. Surprise: treated schools outperform. Is that the tool or the leadership quality? If your comparison group excludes similar schools or if you ignore pre-intervention gaps your result will overstate impact.

A quick fix. When you can’t randomize, match. Use prior achievement, attendance, demographics, and other relevant covariates to build a comparison group that mirrors the treatment group. Pre-register the design if you can. Report robustness checks. If the dosage varies widely, analyze effects by exposure tier. Don’t tuck the comparison logic in an appendix; put it in the narrative so board members can follow the fairness.

Where the Snapshot helps. MomentMN’s model uses district data to build statistically sound comparisons aligned to your decision window. We don’t promise laboratory certainty. Instead, we deliver transparent assumptions, clear matching logic, plain-language results, and practical next steps.

Pitfall #5: Timing, Scale, and the Decision Window Don’t Match

If the timing, scale, and timeframe to decide don’t line up, this can absolutely impact your results. You can miss in three ways:

  1. Too soon: You measure before implementation stabilizes, so you capture noise.
  2. Too late: Findings land after renewals or budget votes.
  3. Too small: Exposure or sample size can’t detect the effect you care about.

What the field says. Gugerty and Karlan argue that timing and readiness matter; evaluating while a program is still being adapted can be wasteful and misleading. Right-fit evidence depends on program maturity and evaluation feasibility. Other experts warn against starting an evaluation at the wrong stage or without an adequate scale to support valid conclusions; they offer practical steps for recruitment/retention and power. 

A K-12 example. A district wants to renew a literacy tool next spring. The product launched in September. Baselines are inconsistent, usage is still stabilizing, and teacher training is staggered. An impact study in November is likely to show weak or inconclusive results not because the tool lacks promise but because the clock and scale aren’t on your side.

A quick fix. Align evidence with the decision calendar. Ask: By when do we need findings? Will exposure be sufficient for an effect of practical interest? If not, consider a diagnostic phase now (implementation, dosage, subgroup reach) and run an impact study after winter when dosage is steady. Or scope the evaluation to where exposure will be strong enough to speak confidently.

Where the Snapshot helps. The Snapshot is built for tight decision windows, so you never miss an opportunity to identify areas of improvement or gaps to fill. We use the data you already collect, focus on a narrow, decision-relevant question, and deliver a short, defensible report in time to influence renewal, procurement, or grant reporting.

Why a MomentMN Snapshot Report Sidesteps the Pitfalls

  • Independent and third-party. Findings land differently when they come from a neutral team. Superintendents can bring a third-party report to the board without eyebrows raised. Donors and community partners listen longer when the evidence isn’t self-published.
  • Rapid-cycle by design. Decisions happen in weeks, not semesters. The Snapshot’s scope is intentionally tight so you get usable answers in time to influence procurement, renewals, or grant milestones. 
  • Built on your existing data. No elaborate data-collection burden. We leverage what districts already track benchmarks, grades, attendance, behavior and connect it cleanly to exposure and implementation.
  • Clear comparisons, plain language. We specify the comparison, control for key confounds, and narrate limitations without jargon. When randomization isn’t possible, we match, we test robustness, and we show you the sensitivity of results.
  • Flexible across programs. Whether it’s a math app, an SEL coaching model, a tutoring service, or an attendance initiative, the Snapshot approach adapts to your context while keeping the logic consistent.

View a Sample MomentMN Snapshot Report Today

Budgets are tight. Student needs are urgent. Decisions can’t wait. Good programs deserve fair, rigorous reads—not vague dashboards or studies that arrive after the vote. When you tighten your theory of change, align implementation and timing, focus on outcomes, and build credible comparisons, educational impact evaluations become what they should be: practical tools for data-driven purchasing and learning out loud. 

If you want that clarity without the 200-page detour, a MomentMN Snapshot Report gives you a fast, independent answer you can trust and explain. 

If you’re ready to stop the guessing and start knowing what works for your students, request a sample report from the Parsimony team. We’ll walk you through how a Snapshot Report can fit your upcoming budget cycle, program review, or vendor negotiation. Bring your data, and we’ll handle the rest so you walk into your next meeting feeling confident, clear, and in control.

Share

Continue Reading

Experience an Easier Way To Get Rigorous Evidence of Your Impact

Have questions? Want a demo?

Book a call with Dr. Amanuel Medhanie, who will answer your questions and show you
around the Snapshot Report service.