7: How do you evaluate policy research?
This chapter starts by discussing the challenges of policing, a profession where there are still significant knowledge gaps not filled by scholarly research. The chapter discusses the merits of background research before embarking on a project, and the value to understanding the concept of internal validity around studies of cause and effect. The chapter outlines various potential threats to this validity, such as selection effects, confounding, and regression to the mean. The chapter also outlines the Sports Illustrated jinx and details the levels of the evidence hierarchy for policy decision-making.
Glossary terms in this chapter
Plausible mechanism: A plausible mechanism is a reasonable or persuasive process that links a cause to an effect.
Internal validity: Refers to the legitimacy of inferences we make about the causal relationship between two things. Strong internal validity in a causal relationship means changes in one thing effect or cause a change in the other.
Selection effects: This is a problem if any effects observed reflect differences that already existed between the people or places receiving the treatment and the comparison areas.
Temporal order: The challenge of establishing which variable changed first and can result in confusion about whether A caused a change in B, or B caused a change in A.
Confounding: When an effect was caused by some other event occurring at the same time as the treatment intervention.
Trend and seasonality: Errors can occur when evaluators do not consider the normal behavior of a phenomenon being observed. What can be mistaken as a treatment effect is often just the continuation of a pre-existing trend.
Measurement effect: If the method of measuring the outcome changes over time, this can be mistaken for an effect.
Testing effect: The act of testing or studying something before the intervention affects the post-test measure.
Hawthorne effect: The change in performance of people attributed to being observed by researchers, rather than the intervention itself.
Attrition: If people or places are lost or removed from a study, the pattern of loss can create an apparent effect.
Regression to the mean: When treatment areas or people are selected because they score particularly high (or low), the tendency to revert to a more moderate value can be mistaken as a treatment effect.
Evidence hierarchy: A scale of quantitative research methods where larger scores indicate study methodologies that are likely to have stronger internal validity and value for policy makers looking to evaluate research studies.
Methodological quality: The extent to which the design and conduct of a study has been able to prevent systematic problems that might affect the trustworthiness of the results.
Counterfactual: Counterfactual areas or groups can be used to represent what would have happened in the absence of an initiative, and better estimate the real impact on the areas or people receiving the intervention.
Systematic review: A type of ‘study of studies’ that addresses a specific question in a systematic and reproducible way, by identifying and appraising all of the literature around the topic area.
Additional information and links
Glasgow ice-cream wars
Yes, this was a real thing. There is a Wikipedia page on the subject, and one of Scotland's newspapers has a short summary of this dark little chapter in Glasgow's history. British tv station Channel 4 has a documentary centered around TC Campbell & Joe Steele, the two men convicted of the murders of the Doyle family from Glasgow in 1984 in 'The Ice Cream Wars'. The entire documentary appears to be on YouTube.
Internal and external validity
This medical journal article has a simple description of internal and external validity, along with this rather simple but effective graphic.
I cannot speak to the authenticity of the authorship here (think ABCDE checklist) but this short summary of threats to internal and external validity covers many of the common concerns.
The Sports Illustrated jinx
Jugs Sports (a pitching machine company) has a list of some of the more famous examples of the Sports Illustrated jinx, in case you want to wallow in the misery of sporting failures. And of course, make sure you check out Sports Illustrated themselves discussing the jinx from a 20 year old article. It even has a copy of the infamous cover.
The evidence hierarchy
As this article points out, most research hierarchies have focused on evaluation of the effectiveness of interventions. They are not suitable for telling you whether an intervention is appropriate or feasible. Historical side note. The article claims that research hierarchies were popularized in 1979 by the Canadian Task Force on the Periodic Health Examination.
I have a number of videos that support the book. Most are only available to instructors for use in class (because the instructor is available to add necessary context), but for this topic, here is the video for the evidence hierarchy.
Related Reducing Crime podcast episode
The chapter's discussion around internal and external validity brought to mind my conversation with renowned researcher, Don Weatherburn, given that he spent much of his career in the area of the Venn diagram where research and policy overlap. How do you take studies that might have good internal validity, and spread them to the policy world (external validity)?
Don Weatherburn is now a Professor at Australia's National Drug and Alcohol Research Centre, but for most of his career ran the New South Wales Bureau of Crime Statistics and Research in Sydney. There he played a pivotal role informing crime and policing policy at the highest levels of government. We talk about his experience and insights working with practitioners in such a high profile public capacity.