Risk assessment tools and criminal reoffending: Does bias determine who is “high risk”?
In The Netherlands in 1993, a man named Thomas was convicted of an arson in which no one was injured. Thomas was sentenced to four years in prison and involuntary treatment in a secure mental health facility. Based solely on professional opinion, Thomas was deemed too dangerous for release. If risk assessment tools had been available and used to evaluate Thomas, he would likely have been deemed a low risk to reoffend [1]. On the other hand, the professionals were firmly convinced that Thomas was dangerous - would risk assessment tools, therefore, have led to a different outcome?
Photo by Stefano Pollio on Unsplash
Risk assessment in the criminal justice system
What is the likelihood that someone who has committed a crime will reoffend? Courts and review boards (e.g., parole boards) seeking an answer to this question often want the input of a criminal justice or mental health professionals, whose decisions affect both the freedom of individuals and public safety [2].However, unstructured professional judgments about risk, which are based solely on a professional’s expertise, are frequently wrong. In fact, a report published in 1981 indicated clinicians’ predictions about the risk of future violent behavior by mentally ill offenders who were released into the community were correct in only about one out of three cases [3]. Therefore, researchers have made significant efforts in developing and improving structured risk assessment tools with the goal of improving the accuracy of risk predictions.
The use of risk assessment tools is now common practice in many aspects of criminal justice decision-making. Risk assessment tools are used by police, probation officers, psychologists, and psychiatrists to assess the risk of criminal offending, sexual offending, and violent offending in at least 44 countries [4]. These tools are also often used by professionals who make recommendations about whether an offender should be placed in long-term psychiatric care in a number of countries (e.g., sex offender civil commitment in the United States [5]; preventive detention in Canada [6]; and court-ordered hospitalization for long-term treatment (TBS) in the Netherlands [7]).
What risk assessment tools are and why they are used
Extensive research indicates that risk assessment tools improve the accuracy of professional judgments about the likelihood of future criminal behavior, violence, and sexual offending [8].Actuarial risk assessment instruments (ARAIs) are one type of tool, and they are based on statistical models of weighted factors supported by research as being predictive of the likelihood of future offending. A risk score is calculated by assigning numeric values to risk factors such as criminal history, mental illness, and substance abuse problems, among many others. Some actuarial risk assessment tools include only static/historical risk factors, such as age of the offender and criminal history. However, some ARAIs also measure dynamic, changeable factors, such as pro-criminal attitudes. Actuarial risk assessment instruments tend to vary widely regarding the precise risk factors that are measured. Decisions about which factors to include depend on the tool developer, research findings, and the type of risk being measured. A second category of risk assessment tools involves structured professional judgment (SPJ), which guides the evaluator’s consideration of both static and dynamic risk factors, as well as the evaluator’s formulation of a risk management plan. However, unlike ARAIs, SPJ tools do not produce an automatic “risk score”; the evaluator determines the final risk estimate and makes suggestions about how the risk might be managed [9].
Structured risk assessment tools are also believed to reduce the likelihood the evaluator’s estimate of an offender’s reoffending risk will be influenced by bias. Bias is a systematic error in reasoning or logic that occurs as the result of the automaticity with which the human mind processes information based on expectations and experience [10]. Perhaps the most well-known example of this phenomenon is confirmation bias, which occurs when attention is drawn to evidence that supports a favored scenario or outcome, while evidence that weakens or contradicts the preferred hypothesis is discounted or ignored altogether [11]. For example, research suggests that criminal investigators and police trainees tend to view evidence such as witness statements, DNA evidence, and photo evidence as less credible and reliable if the evidence contradicts their beliefs about the guilt of the suspect [12] [13].
Structured risk assessment tools and bias
Bias on the part of an evaluator engaged in unstructured professional judgment [14] is believed to contribute to inaccurate predictions of risk [15]. Bias on the part of justice system officials who use their professional judgment to assess an individual’s risk of reoffending is also thought to contribute to the unfair treatment of minority groups [16] [17] [18].Therefore, proponents of the use of risk assessment tools (particularly ARAIs) in criminal justice decision-making believe the tools will reduce the chances that a criminal offender will be treated unfairly based on stereotypes and bias [19].
However, there is currently limited information about whether risk assessment tools cure the influence of bias [20]. In fact, as discussed in this article, research about the use of ARAIs indicates several common practices that compromise both the accuracy and objectivity of risk evaluations and recommendations about interventions or treatment to help reduce risk. These findings suggest that a presumption of objectivity in the risk estimate or recommendations, solely because they appear to be the product of a structured risk assessment tool, is unwarranted.
There are two primary categories of research that address questions of whether and how bias affects the use of structured risk assessment tools. The first category we will refer to as “objectivity research.” Objectivity research addresses the influence of internal and external factors on evaluators’ use of risk assessment tools.
The second category we will refer to as “outcome research.” Outcome research relates to how risk assessment tools are completed and how risk scores are used by criminal justice actors (e.g., judges, probation officers) to make their decisions. When decisions are not aligned with risk scores, this suggests the potential influence of bias, which would undermine the accuracy of the very tools intended to curb its influence. In other words, the simple fact of completing a structured risk assessment instrument may not necessarily eliminate the influence of bias in an evaluator’s conclusions about an offender’s level of risk for reoffending.
Objectivity research: Moral judgments and being on the “right side”
The process of a risk evaluation can be influenced by the personal characteristics or attitudes of the evaluator (internal) and external sources of information that are of little or no relevance to accurate predictions of risk [21]. An example of an internal source of bias is when an evaluator is more strongly influenced by his or her moral judgment of the offender or his actions than by risk factors relevant to reoffending. Context is an example of an external source of potential bias, as is demonstrated when different evaluators reach different conclusions about the same offender, depending on the conclusion most favorable to the party by whom the evaluator is being paid.
Moral judgments by the evaluator about sexual offenders may affect his or her judgments about an offender’s risk of reoffending. In one recent study, for example, forensic psychologists and psychiatrists (n = 151) in the Netherlands generally assigned more importance to risk factors for sexual offenses that carry a negative moral connotation than to factors that do not [22]. Two factors with a negative moral connotation are an offender’s lack of empathy for the victim and lack of motivation for treatment. Experts assigned more weight to these factors than to other factors that are empirically more predictive of risk, such as never having an intimate relationship or exhibiting behavioral problems at school [23]. There is no demonstrated empirical link between lack of victim empathy and sexual offense recidivism [24], but this lack of empathy may be perceived as morally wrong (i.e., going against generally accepted values), thereby influencing the evaluators to consider these factors as important. Although this particular study did not utilize a risk assessment tool, the standard among the participating professionals is to utilize SPJ tools that require professional judgment about the importance of various risk factors [25]. These findings suggest that some experts may be influenced more by the moral dimensions of certain risk factors than by those factors for which there is more substantive scientific support.
Contextual factors as potential facilitators of bias can influence what an evaluator observes, his or her perception or interpretation of information, and thereby the conclusions he or she makes [26]. Studies about the scoring of ARAIs indicate that evaluators are not as objective regarding their observations and interpretations as they might believe [27]. For example, research regarding legal context has uncovered an “adversarial allegiance effect” [28] [29] [30]. The adversarial allegiance effect suggests ARAIs can be influenced by an evaluator’s commitment to a particular legal outcome [31], or subtle pressure to reach a particular conclusion [32]. In plain terms, evaluators appear to be influenced in their risk evaluations by the side that hired them (i.e., the prosecution or the defense) [33]. Although this effect appears to be more significant with factors that require professional judgment on the part of the evaluator (e.g., judgments of attitudes or mental status of the offender), the effect is still measurable with the use of tools that assess only static factors (e.g., number of previous offenses) [34]. What is surprising about this finding is that static factors should generate the same score among evaluators.
The research regarding moral judgments and adversarial allegiance does not imply that professionals intentionally manipulate results, but rather that they may not always be aware of the factors that influence their coding decisions or ultimate risk judgments. Limiting evaluator exposure to potentially biasing information (e.g., the referral source) and conscious efforts by the evaluator to consider alternative explanations early in the process may help minimize the effects of bias [35].
Case study: At the time Thomas was convicted, many psychological professionals believed that people who committed arson were very disturbed and sexually perverse individuals at very high risk to reoffend [36]. How might these views have affected scoring if risk assessment tools had been used to evaluate him?
Outcome research: Overrides, outgroups, and inertia
Criminal justice professionals who use ARAIs sometimes question the results and decide to override them [37]. A professional override in an ARAI is a discretionary decision to lower or increase the final risk judgment obtained by scoring the ARAI. When overrides are used to increase the risk category, this may be in an effort to protect the evaluator from potential blame if an offender goes on to commit a new crime [38]. Although some manuals for administering ARAIs allow evaluators to override the results based on their professional judgment [39], research indicates these decisions tend to significantly decrease the accuracy of the risk prediction [40] [41] [42], and therefore should be used only in exceptional cases. The inadequately justified use of overrides tends to decrease the predictive accuracy of ARAIs [43] [44], and runs contrary to one of the primary reasons why risk assessment tools were developed.
Unfortunately, professional override decisions may also create opportunities for bias to overshadow the presumed objective nature of ARAIs [45]. For example, research indicates that overrides are not applied equally across different demographic groups. At least two published studies have found that overrides in risk assessment of juveniles for decisions about detention were associated with demographic characteristics, such as race and gender [46] [47]. For example, in one study, African American youths were approximately 33% less likely to receive a downward override than white youths [48]. It is troubling that if overrides are applied in a biased manner, racial and ethnic minorities will likely face the negative consequences of reduced accuracy related to overrides.
Evaluators also tend to use professional overrides to increase, rather than decrease, the risk level more often for some types of offenders than for others [49]. For example, studies reveal overrides are used in 33-74% of cases involving sexual offenders compared to 15-41% of nonsexual offenders [50] [51]. Professional overrides may therefore implicate biased judgments about an offender. Requiring evaluators to document the justification for an override should improve accountability in these risk judgments by making the stated reasons for the override available for review. Oversight of the stated justifications for overrides would enable identification of override patterns and when their use is appropriate.
Case study: Given the views about arsonists at the time Thomas was convicted, an evaluator might nevertheless have decided to override the results of a risk assessment tool based on prevailing inaccurate beliefs about arson offenders.
Sometimes professionals simply disregard the results of the tool completely in making treatment and supervision recommendations [52] [53] [54] [55]. For example, a survey of over one thousand probation officers in the United States revealed that despite scoring a risk assessment instrument, about 40% of the officers reported they were unlikely to base their decisions or recommendations on the score [56]. Although using a risk assessment tool is no guarantee of objectivity in the evaluation, deliberately ignoring the results to substitute one’s own judgment compromises any potential benefits that might be realized by using a structured risk assessment approach.
Case study: If a risk assessment tool indicated that Thomas was a low risk to reoffend, that still may not have changed the judges’ decision that Thomas should spend an indeterminate amount of time in confinement in a psychiatric treatment facility.
Conclusion
Making accurate predictions about the likelihood of future criminal behavior is a complex task, and even the best statistical models of risk factors under ideal conditions yield accurate predictions in only about 66-74% of cases [57]. Furthermore, when evaluators are influenced by information unrelated to risk factors, or when they make adjustments to risk scores, the accuracy of the risk prediction decreases. Thomas was finally released in 2008, after spending fifteen years in confinement. Inaccurate risk assessment, as illustrated by Thomas’s case, can lead to significant unnecessary costs in terms of money and years of life lost.
The current objectivity and outcome research [58] [59] [60] [61] [62] suggests that when an evaluator uses a risk assessment tool, the results do not necessarily reflect an objective evaluation. Therefore, the identification of the sources and operation of evaluator bias and testing of the efficacy of debiasing strategies should be research priorities. Although the use of risk assessment tools improves accuracy over unstructured clinical judgment in estimating the likelihood of reoffending, the tools are not a panacea for bias. Risk assessment tools should be used to identify ways to manage and reduce risk - but their vulnerability to evaluator bias indicates they should not be used to justify significant deprivations of freedom.
References
[1] De Ruiter, C., & Hildebrand, M. (2017). Risicotaxatie. In P.J. van Koppen, J.W. de Keijser, R. Horselenberg, & M. Jelicic (Eds.), Routes van het recht: Over de rechtspsychologie (pp. 983-999). Den Haag: Boom juridisch.
[2] Monahan, J., Feshbach, S., Holder, W., Howe, R. A., Kittrie, N., Loevinger, J.,…Wasserstrom, R. (1978). Report of the Task Force on the Role of Psychology in the Criminal Justice System. American Psychologist, 1099-1113.
[3] Monahan, J. (1981). The clinical prediction of violent behavior. National Institute of Mental Health. DHHS Publication No. (ADM) 81-921. Washington, DC: U.S. Government Printing Office. 47-49.
[4] Singh, J. P., Desmarais, S. L., Hurducas, C., Arbach-Lucioni, K., Condemarin, C., Dean,…Otto, R. K. (2014). International perspectives on the practical application of violence risk assessment: A global survey of 44 countries. International Journal of Forensic Mental Health, 13:3, 193-206.
[5] Fabian, J. (2013). The Adam Walsh Child Protection and Safety Act: Legal and psychological aspects of the new civil commitment law for federal sex offenders. Cleveland State Law Review, 60, 307-364.
[6] Blais, J. (2015). Preventative detention decisions: Reliance on expert assessments and evidence of partisan allegiance within the Canadian context. Behavioral Sciences and the Law, 33, 74-91.
[7] Van Marle, H. J. C. (2002). The Dutch Entrustment Act (TBS): Its principles and innovations. International Journal of Forensic Mental Health, 1(1), 83-92, DOI:
10.1080/14999013.2002.10471163.
[8] Grove, W. H., Zald, D. H., Lebow, B. S., Snitz, B. E., & Nelson, C. (2000). Clinical versus mechanical prediction: A meta-analysis. Psychological Assessment, 12(1), 19-30.
[9] Guy, L. S., Packer, I. K., Warnken, W. (2012). Assessing risk of violence using structured professional judgment guidelines. Journal of Forensic Psychology Practice, 12(3), 270-283.
[10] Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124-1131.
[11] Nickerson, R. S. (1998). Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology, 2(2), 175-220.
[12] Ask, K., & Granhag, P. A. (2007). Motivational bias in criminal investigators’ judgments of witness reliability. Journal of Applied Social Psychology, 37(3), 561-591.
[13] Ask, K., Rebelius, A., & Granhag, P. A. (2008). The ‘elasticity’ of criminal evidence: A moderator of investigator bias. Applied Cognitive Psychology, 22, 1245-1259.
[14] Arkes, H. R. (1981). Impediments to accurate clinical judgment and possible ways to minimize their impact. Journal of Consulting and Clinical Psychology, 49(3), 323-330.
[15] Monahan, J. (1981). The clinical prediction of violent behavior. National Institute of Mental Health. DHHS Publication No. (ADM) 81-921. Washington, DC: U.S. Government Printing Office.
[16] Albonetti, C. A. (1991). An integration of theories to explain judicial discretion. Social Problems, 38(2), 247-266.
[17] Everett, R. S., & Wojtkiewicz, R. A. (2002). Difference, disparity, and race/ethnic bias in federal sentencing. Journal of Quantitative Criminology, 18(2), 189-211.
[18] Baumer, E. P. (2013). Reassessing and redirecting research on race and sentencing. Justice Quarterly, 30(2), 231-261. DOI: 10.1080/07418825.2012.682602.
[19] Hoge, R. D. (2002). Standardized instruments for assessing risk and need in youthful offenders. Criminal Justice and Behavior, 29(4), 380-396.
[20] Stevenson, M. (2017). Assessing risk assessment in action. George Mason Law & Economics Research Paper No. 17-36. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3016088
[21] Dror, I. E., & Murrie, D. C. (2018). A hierarchy of expert performance applied to forensic psychological assessments. Psychology, Public Policy, and Law, 24(1), 11-23.
[22] Huls, L., Nens, L., Rinne, T., & Verschuere, B. (2018). How reliable are the predictors of sexual recidivism? Moral considerations can color the judgement of pro Justitia reporters. Tijdschrift Voor Psychiatrie, 60(2), 78-86.
[23] Mann, R. E., Hanson, R. K., Thornton, D. Assessing risk for sexual recidivism: Some proposals on the nature of psychologically meaningful risk factors. Sexual Abuse: A Journal of Research and Treatment, 22(2), 191-217.
[24] Vachon, D. D., & Lynam, D. R. (2016). Fixing the problem with empathy development and validation of the affective and cognitive measure of empathy. Assessment, 23, 135-149.
[25] Huls, L., Nens, L., Rinne, T., & Verschuere, B. (2018). How reliable are the predictors of sexual recidivism? Moral considerations can color the judgement of pro Justitia reporters. Tijdschrift Voor Psychiatrie, 60(2), 78-86.
[26] Saks, M. J., Risinger, D. M., Rosenthal, R., & Thompson, W. C. (2003). Context effects in forensic science: A review and application of the science of science to crime laboratory practice in the United States. Science & Justice, 43(2), 77-90.
[27] Zappala, M., Reed, A. L., Beltrani, A., Zapf, P. A., & Otto, R. K. (2018). Anything you can do, I can do better: Bias awareness in forensic evaluators. Journal of Forensic Psychology Research and Practice, 18(1), 45-56, DOI: 10.1080/24732850.2017.1413532.
[28] Blais, J. (2015). Preventative detention decisions: Reliance on expert assessments and evidence of partisan allegiance within the Canadian context. Behavioral Sciences and the Law, 33, 74-91.
[29] Murrie, D. C., & Boccaccini, M. T., Johnson, J. T., Janke, C. (2008). Does interrater (dis)agreement on Psychopathy Checklist Scores in sexually violent predator trials suggest partisan allegiance in forensic evaluations? Law and Human Behavior, 32, 352-362.
[30] Murrie, D. C., Boccaccini, M. T., Tussey, C. (2009). Rater (dis)agreement on risk assessment measures in sexually violent predator proceedings: Evidence of adversarial allegiance in forensic evaluation? Psychology, Public Policy, and Law, 15:1, 19-53.
[31] Brodsky, S. L. (1991). Testifying in court: Guidelines and maxims for the expert witness. Washington, DC: American Psychological Association.
[32] Grisso, T. (1998). Forensic evaluation of juveniles. Sarasota, Florida: Professional Resources Press.
[33] Murrie, D. C., Boccaccini, M. T. (2015). Adversarial allegiance among expert witnesses. Annual Review of Law and Social Science, 11, 37-55.
[34] Murrie, D. C., Boccaccini, M. T., Tussey, C. (2009). Rater (dis)agreement on risk assessment measures in sexually violent predator proceedings: Evidence of adversarial allegiance in forensic evaluation? Psychology, Public Policy, and Law, 15:1, 19-53.
[35] Neal, T., & Brodsky, S. (2016). Forensic psychologists’ perceptions of bias and potential correction strategies in forensic mental health evaluations. Psychology, Public Policy, and Law, 22:1, 58-76.
[36] Harris, G. T., & Rice, M. E. (2011). Mentally disordered firesetters: Psychodynamic versus empirical approaches. International Journal of Law and Psychiatry, 7, 19-34.
[37] Miller, J., & Maloney, C. (2013). Practitioner compliance with risk/needs assessment tools. Criminal Justice and Behavior, 40:7, 716-736.
[38] Guy, L. S., Nelson, R. J., & Fusco-Morin, S. L., & Vincent, G. M. (2014). What do juvenile probation officers think of using the SAVRY and YLS/CMI for case management, and do they use the instruments properly? International Journal of Forensic Mental Health, 13, 227-241.
[39] Harris, P. M. (2006). What community supervision officers need to know about actuarial risk assessment and clinical judgment. Federal Probation, 70(2), 8-14.
[40] Guay, J., & Parent, G. (2017). Broken legs, clinical overrides, and recidivism risk: An analysis of decisions to adjust risk levels with the LS/CMI. Criminal Justice and Behavior, 45(1), 82-100.
[41] Schmidt, F. S., Sinclair, S. M., & Thomasdóttir, S. (2016). Predictive validity of the Youth Level of Service/Case Management Inventory with youth who have committed sexual and non-sexual offenses. Criminal Justice and Behavior, 43, 413-430.
[42] Wormith, J. S., Hogg, S. & Guzzo, L. (2012). The predictive validity of a general risk-needs assessment inventory on sexual offender recidivism and exploration of the professional override. Criminal Justice and Behavior, 39, 1511-1538.
[43] Storey, J. E., Watt, K. A., Jackson, K. J., & Hart, S. D. (2012). Utilization and implications of the Static-99 in practice. Sexual Abuse: Journal of Research and Treatment, 24, 289-302.
[44] Wormith, J. S., Hogg, S. & Guzzo, L. (2012). The predictive validity of a general risk-needs assessment inventory on sexual offender recidivism and exploration of the professional override. Criminal Justice and Behavior, 39, 1511-1538.
[45] Chappell, A. T., Maggard, S. R., & Higgins, J. L. (2012). Exceptions to the rule? Exploring the use of overrides in detention risk assessment. Youth Violence and Juvenile Justice, 11(4), 332-348.
[46] Schmidt, F. S., Sinclair, S. M., & Thomasdóttir, S. (2016). Predictive validity of the Youth Level of Service/Case Management Inventory with youth who have committed sexual and non-sexual offenses. Criminal Justice and Behavior, 43, 413-430.
[47] Chappell, A. T., Maggard, S. R., & Higgins, J. L. (2012). Exceptions to the rule? Exploring the use of overrides in detention risk assessment. Youth Violence and Juvenile Justice, 11(4), 332-348.
[48] Chappell, A. T., Maggard, S. R., & Higgins, J. L. (2012). Exceptions to the rule? Exploring the use of overrides in detention risk assessment. Youth Violence and Juvenile Justice, 11(4), 332-348.
[49] Wormith, J. S., Hogg, S. & Guzzo, L. (2012). The predictive validity of a general risk-needs assessment inventory on sexual offender recidivism and exploration of the professional override. Criminal Justice and Behavior, 39, 1511-1538.
[50] Schmidt, F. S., Sinclair, S. M., & Thomasdóttir, S. (2016). Predictive validity of the Youth Level of Service/Case Management Inventory with youth who have committed sexual and non-sexual offenses. Criminal Justice and Behavior, 43, 413-430.
[51] Storey, J. E., Watt, K. A., Jackson, K. J., & Hart, S. D. (2012). Utilization and implications of the Static-99 in practice. Sexual Abuse: Journal of Research and Treatment, 24, 289-302.
[52] Bonta, J., Rugge, T., Scott, T., Bourgon, G., & Yessine, A. (2008). Exploring the black box of community supervision. Journal of Offender Rehabilitation, 47, 248-270.
[53] Krysik, J., & LeCroy, C. W. (2002). The empirical validation of an instrument to predict risk of recidivism among juvenile offenders. Research on Social Work Practice, 12, 71-81.
[54] Shook, J. J., & Sarri, R. C. (2007). Structured decision making in juvenile justice: Judges’ and probation officers’ perceptions and use. Children and Youth Services Review, 29, 1335-1351.
[55] Viljoen, J. L., Cochrane, D. M., & Jonnson, M. R. (2018). Do risk assessment tools help manage and reduce risk of violence and reoffending? A systematic review. Law and Human Behavior. Advance online publication. http://dx.doi.org/10.1037/lhb0000280.
[56] Miller, J., & Maloney, C. (2013). Practitioner compliance with risk/needs assessment tools. Criminal Justice and Behavior, 40:7, 716-736.
[57] Fazel, S., Singh, J. P., Doll, H., & Grann, M. (2012). Use of risk assessment instruments to predict violence and antisocial behaviour in 73 samples involving 24,827 people: systematic review and meta-analysis. British Medical Journal, 345:e4692.
[58] Austin, J. (1986). Evaluating how well your classification system is working. Crime & Delinquency, 32, 302-322.
[59] Gebo, E., Stracuzzi, N. F., & Hurst, V. (2006). Juvenile justice reform and the courtroom workgroup: Issues of perception and workload. Journal of Criminal Justice, 34, 425-433.
[60] Lyle, C. G., & Graham, E. (2000). Looks can be deceiving: Using a risk assessment instrument to evaluate the outcomes of child protection services. Children and Youth Services Review, 22, 935-949.
[61] Shook, J. J., & Sarri, R. C. (2007). Structured decision making in juvenile justice: Judges’ and probation officers’ perceptions and use. Children and Youth Services Review, 29, 1335-1351.
[62] Krysik, J., & LeCroy, C. W. (2002). The empirical validation of an instrument to predict risk of recidivism among juvenile offenders. Research on Social Work Practice, 12, 71-81.