Evaluating over time is measuring at the beginning and end of an intervention to know if people changed during the program. It is ideal for monitoring progress, adjusting the operation and being accountable with clear data. It does not prove causality, but it does offer solid evidence of progress when designed well: consistent indicators, comparable moments, and care for recall or seasonality biases. Also supports a retrospective baseline when initial measurement was not possible.

What is it for (and what is it not for)?

Its purpose is to answer if changes occurred in the population served. Connect with daily management. By seeing which indicators change or remain stable, you can adjust interventions (content, channels, schedules or support) or the strategy as a whole (theory of change). It's especially useful when you need to show results (what takes time to happen) beyond outputs (what happens immediately).

It serves to demonstrate progress, not to attribute it exclusively to the program. If your decision requires causality, you must evaluate with control. If your priority is learning and improving, evaluating over time is the right choice.

Measure well: instruments, times and data quality

Quality does not depend on “long questionnaires,” but on questions that capture real behavior. Prioritizes brief and measurable instruments in the user's available time.

In surveys, use clear questions and maintain the same response scale throughout the questionnaire (for example: 1-5 on all questions). Mix outcome items (example: last month's income) with adoption signals (example: you completed 2 course modules in the last 14 days). Complement with interviews or focus groups to understand the why behind the numbers; those qualitative findings will tell you what to adjust in delivery.

Time must respond to the pace of change: some results appear in weeks (knowledge, habits), others in months (income, employability). Don't measure too soon; Don't wait so long that the effect is diluted either. Fewer instruments, better measured, generate more useful data than cumbersome unrepeatable batteries.

What are baselines and what are they for?

It is the initial measurement that is taken at the start of the program with the same instruments that will be used at closure. Its function is to set the starting point for comparison over time, segment and monitor outputs (what happens immediately) and results (what takes time to happen), in addition to helping to adjust goals and detect early biases.

There are scenarios where the program has already started and a baseline was collected at the start. In these cases, the measurement is carried out with a retrospective baseline: in closing we ask how they were before and after, and we collect self-attribution (how much of the change people attribute to the program). It doesn't prove causality, but it provides a useful signal to learn and adjust.

AppearanceNormal BaselineRetrpsective baseline
Measurement timeIt is collected at the beginning and at the end with the same instrument.He gets up at closing, asking about the “before” and “after.” For example: current monthly income and monthly income before the program, same scale.
GoalEstablish a starting point and “clean” comparability.Reconstruct the starting point when there was no initial measurement.
What and how to reportReports observed change between baseline and closure, separates outputs and results; includes intervals if applicable.It presents two estimates: reported change (before→after) and self-attribution (percentage or scale). Explain possible recall biases and how you mitigated them (anchors, triangulation, scale consistency).
AdvantagesGreater precision and temporal comparability; less recall bias.It allows learning without losing the cycle (implementation period and ongoing measurement); useful for obtaining information even though it was not an ideal scenario.
LimitationsIt requires planning and budgeting from the start.It depends on people's memory and there may be responses based on social desirability bias; does not allow the change to be attributed solely to the program.
Good practicesCalendar that avoids seasonal biases (seasonal peaks), standardized training of the collection team (same script and criteria), pilot and quality control.Anchor to specific periods (“last 30 days”), use the same scales, record external evidence and train not to induce responses.
Example“Between baseline and close, indicator X rose Y%.”"Between the aforementioned and the closure, the average monthly income rose 18%; people attribute 7/10 of that change to the program; this does not imply causality."

Alignment Common threats and how to mitigate them

Any before-after evaluation is exposed to external factors. Recognize and mitigate them:

  • History/external shocks: Economic or climatic events can move your indicators. Document context and, if possible, incorporate expository questions. Example: “Did you lose your job because of the plant closure?”
  • Maturation/learning: changes that occur over time. Define a minimum exposure period and justify it. Example: girls and boys improve reading just by advancing in grade, even without a program.
  • Selection bias: If you only measured those who remained, you may overestimate effects. Record desertion and compare profiles of those who leave vs. who follow. Example: only those who completed the course respond to the survey.
  • Inconsistent measurement: changes in the questionnaire or in who collects data. Standardizes training and pilots. Example: in baseline you used a 1-5 scale and in closing you changed to 1-10.
  • Recall bias (hindsight): anchors questions in facts and validates against records when they exist. Example: people tend to “inflate” the income they remember from a year ago.

These measures do not replace a comparison group, but they increase the credibility of the before-after.

How to analyze and interpret

It begins with descriptive statistics: mean, median, and distribution at baseline and closing. Then calculate differences (absolute and percentage) and analyze by relevant segments (age, gender, area, intensity of use). Visualize with bars or violins to avoid misleading readings due to outliers.

Includes confidence intervals where the sample allows and reports migration of categories (e.g., from “unemployed” to “employed”). If you applied retrospective baseline, clearly separate reported change and self-attribution (“X% say the change is largely due to the program”).

In the narrative, avoid causal phrases (“thanks to the program”) and use honest formulations: “between the baseline and the closure, the average income increased 18%; without a comparison group it is not possible to attribute it solely to the intervention.”

From measurement to decision

Evaluation over time is a management tool. Returns the findings to concrete decisions:

  • Adjust delivery: content, schedules, modality, intensity.
  • Segment prioritization: duplicate efforts where the change was greatest; redesign where there was no progress.
  • Realistic goals: update goals for outputs and results for the next cycle.
  • Bridge to greater rigor: if you need to attribute, plan a phase with control or a quasi-experimental design.

Measuring without deciding is useless; Each indicator must trigger a clear action.

Frequently asked questions

Conclusion

Assess over time organizes learning: shows if things are improving and where to adjust. Done with practical rigor - comparable measurements, beware of bias, honest reading - it allows you to improve today and prepare the ground for controlled measurements tomorrow. Do you want to put together a useful and realistic before-after plan for your program? Let's talk.