Q: If I were to set up a study examining the effects of an asset-building community, how would I go about testing the efficacy of this initiative?
A: We all know (or should know) that the gold standard for scientific studies is random assignment – that is, to understand the “true” effect of some intervention/treatment/condition, you randomly assign people to two different groups, measure these folks on whatever it is you’re interested in, implement the intervention, and then measure them again. And differences that are there after the intervention, but were not there before the intervention, are interpreted as being “caused” by the intervention. Nice and simple, yes?
Well, yeah, kind of. But random assignment of people is wildly different than random assignment of communities. For various and sundry reasons, this empirical gold standard is not very helpful to those of us using the community as the level of analysis (vs. looking at the person). So given that random assignment is off the table, what are the other research design options at our disposal to address the seemingly simple question posed above? And do these options come with their own biases that obscure our results?
There was an interesting study that came out about 15 years ago that conducted a randomly assigned experiment of a welfare-to-work program, but also had the data to test alternative forms of comparisons. Thus, they had the “true” differences (as found via the random assignment), but then had the resources and data to test other kinds of comparisons, and looked to see how those results differed from the “true” results. Put another way – they were able to conduct a quasi-experimental study using both experimental and quasi-experimental data!! How wonderfully research-meta is that? If that doesn’t get your research heart palpitating, I just don’t know what will.
So what were their other comparison groups? They identified four:
Comparison Group |
Brief Description |
Analog to Community Comparison |
Findings |
Across-site unmatched: |
Compared people in welfare-to-work program with a group of people from different state |
Comparing an HCHY initiative to some random community in the country |
47% |
Across-site matched: |
Similar to the above, except they selectively chose their comparison group on sociodemographic variables to closely match the participants in the welfare-to-work program |
This would be like comparing the Children First initiative in St. Louis Park to another non-HCHY community that looks a lot like St. Louis Park (e.g., population, income) |
38% |
Within-site across-cohort: |
Simply compared people before they went into the program with what they looked like after the program |
Most commonly known as a pre/post-test design, this would be like studying a community before an initiative was implemented, implementing it, and then assessing the community after some amount of time after implementation. |
29% |
Within-site across-office: |
There were several w-t-w offices within the same city, so they compared people in the program at once city site with a comparison group of non-program people from a different city site |
This would be akin to comparing an HCHY community with a non-HCHY community in near proximity (e.g., St. Louis Park with Edina). |
13% |
Their study yielded a number of fairly disappointing findings. Regardless of the method of comparison used, they uncovered biases not just in the order of magnitude of the differences (i.e., were the differences in the alternative comparison groups bigger or smaller than the “real” differences found via random assignment), but far more troublingly, they found differences in the direction of effects. This latter finding is represented in the rightmost column in the table above. What these percentages reflect is the number of times (proportionally) that these alternative comparison groups either showed no effect where there was one, or showed an effect that went in the opposite direction. Thus, almost 1 out of every 2 comparisons (47%) in the across-site unmatched group differed from what was found in the random assignment group. Even in the best case with the within-site across-office group, 13% of all comparisons would have led to a completely different interpretation. That may seem low, but it’s huge from a statistical perspective. You essentially have very little confidence in the “real” nature of your data. This is what leads to ulcers and gray hair among researchers.
The authors of this study aren’t very upbeat about their findings:
Community-wide programs present special problems for evaluators because the “nectar of the gods” – random assignment of individuals to program treatment and to a control group – is beyond their reach…Our review of the experience to date with alternative methods is generally discouraging; no clear second-best method emerges from the review. (Hollister & Hill, 1995, p. 158).
Now this doesn’t mean that we shouldn’t try to evaluate or assess or research the effects of community initiatives; rather, it suggests that we certainly need to understand the challenges in conducting this work, and continue to be creative about how we go about gathering and interpreting our data. So, from a certain perspective, this conundrum makes this research work even more exciting, as we’re forced to think of new and innovative ways of understanding these processes. And for researchers, that’s pretty darn cool.
