Evaluate with control

Evaluating with control is measuring the start and end of an intervention, and what would have happened if it did not occur. By including a comparable group without care (or other care), you can more accurately attribute whether the observed changes are due to the program and not external factors. This evaluation serves to scale what works, redesign what does not and be accountable with robust evidence to funders, boards of directors and public entities, maintaining ethical criteria in selection, consent and data protection.

What is it for and when is it appropriate?

The central purpose is to attribute the change to the program with greater rigor. In practice, it allows us to answer how much the participants improved due to the program compared to a similar group without exposure. It is convenient when you have to make big decisions such as scaling a public policy, validating a model for impact investment, or allocating budget between alternatives.

It is also useful when results may be influenced by external shocks (economic crises, seasonality, new regulations, etc.). With well-designed control, you separate what would have happened even without the program, from what happens because the intervention exists.

Common layouts (from most to least controlled)

Not all contexts allow the same. Here “more controlled” means that we are more confident that the groups are comparable and we rely less on assumptions; “Less controlled” implies more dependence on what has already happened and on assumptions about the data. These are common schemes and what they require:

Randomized trial (RCT): Randomizes who receives and who does not. Ensure comparable groups. It requires clear criteria, allocation ethics, and statistical power.
Differences-in-differences (DiD): compares the evolution of treated and non-treated cases over time. You need plausible parallel trends and pre-post measurements.
Regression discontinuity (RDD): takes advantage of an eligibility threshold (e.g., score). Around the cut, the groups are comparable. It calls for sufficient case density near the threshold.
Matching/propensity score (PSM): builds a similar control for observed characteristics. It does not correct for differences in unobserved variables; requires a good set of covariates.
Natural instruments (IV): uses an external variable that affects participation, but not the result, except for participation. Useful, but requires strong assumptions and verification.

How to implement it step by step

Before thinking about formulas, ground the process with a focus on people and operations. Define who you are going to serve, why and what is the minimum amount of program that each person needs to receive (hours, sessions, visits, etc.). Align your theory of change - the roadmap to change the lives of others (the step by step to achieve it) - with what you can really deliver in the territory.

The most difficult step is to decide the appropriate methodology depending on different variables: is there a possibility of a lottery? Is there a rule or score that defines who gets in and who doesn't? can you collect baseline in both groups and ensure comparable follow-up?

Implement in an orderly manner: use a baseline common to treatment and control and an equivalent closing questionnaire.

Minimal checklist (so as not to get lost)

Transparent eligibility and written criteria.
Baseline and closure in both groups, same scales and time windows.
Compliance monitoring (who received what and how much).
Contamination record (controls that accessed the service through another means).
Analysis plan defined before opening data (metrics, subgroups, management of missing data).

Quality, ethics and responsible communication

Attribution does not justify unethical practices. Avoid leaving people out without a clear and justifiable reason and clearly communicate what it means to participate (and not participate). Request consent and offer alternatives to those left without care, such as deferred admission, standard services, or referral to allies.

Be aware of risks: the fact that people leave the study along the way (dropout), external shocks and measurement errors. Mitigate with active monitoring, context logs, pilots and quality audits. Report what you find without promising more than what the data shows: a point estimate always comes with uncertainty (confidence intervals) and assumptions that must be made explicit.

Before publishing, validate results with technical teams and, when appropriate, share findings with the participating population in a clear and useful way.

What to expect from the findings?

An evaluation with control provides attributable effects: average differences between treatment and control after the intervention, with their magnitude and precision. It can also reveal heterogeneity of effects (which segments changed the most) and help estimate cost-effectiveness relationships.

Effect size: how much the indicator changes in natural units or standard deviations.
Precision: confidence intervals and significance; avoid conclusions due to small and unstable differences.
Segmentation: analyzes subgroups (age, gender, intensity of use, territory) with caution so as not to inflate false positives.
Operational implication: what to adjust, what to scale, what to abandon; direct link with budget decisions.

Frequently asked questions

Conclusion

Evaluate with control allows you to attribute with confidence and make big decisions without guessing: scale what works, redesign what doesn't, and be accountable with clear evidence. If you are looking for a feasible, ethical and robust design for your program, from the draw to the analysis. Let's talk.