Resources
Recommended reading and other resources for safety risk management.
Original recommendations
- How Complex Systems Fail, PDF - a short introduction by Richard Cook applying systems thinking to safety
- How Complex Systems Fail, Velocity 2012 - a talk by Richard Cook relating systems safety to IT
- Engineering a Safer World - a free MIT book introducing STAMP (Systems-Theoretic Accident Modeling and Processes) and STPA (STAMP-Based Process Analysis)
- MIT Partnership for a Systems Approach to Safety (PSAS)
Short essays
- Resilience Engineering - Erik Hollnagel’s account of the origins of Resilience Engineering
- Resilience Assessment Grid - recommended for the succinct description of the four potentials of resilient performance in the beginning of the essay: Respond, Monitor, Learn, Anticipate
- The NO view of ‘human error’ - argues that we should stop using ‘human error’ as an explanation for accidents/failures as it is not helpful
- From the coalface: an essay on the early history of sociotechnical systems - a blog post on how the idea of sociotechnical systems came from the study of coal mining in Britain and the insight that the “best work arrangements come out of seeking a match between technical and social elements of the modern day workplace”
Books
- Normal Accidents: Living with High-Risk Technologies - an older but influential book written by Charles Perrow, a sociologist, in the aftermath of the Three Mile Island accident - Perrow’s ideas of coupling and complexity are still valid today
- The Field Guide to Understanding ‘Human Error’ by Sidney Dekker - recommended by several people as an easily understandable introduction to resilience engineering concepts from safety, also available on O’Reilly
- Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations - Nicole Forsgren, Jez Humble, Gene Kim - primarily a DevOps book, with some safety influences, namely Westrum’s concept of generative culture, describes both the methodology and results of research led by Dr. Nicole Forsgren
Notable academic papers
- Bainbridge, L. (1983). IRONIES OF AUTOMATION. IRONIES OF AUTOMATION, PDF - a classic and accessible paper on the downside of automation that stands up even today
- Endsley, M. R. (1995). Toward a Theory of Situation Awareness in Dynamic Systems, PDF - Endsley’s main paper on Situation Awareness, a theoretical construct that is useful in understanding decisions made by operators in emergency situations
- Helmreich, R. L., Klinect, J. R., & Wilhelm, J. A. (1999). Models of threat, error, and CRM in flight operations, PDF - an excellent paper covering key aspects of modern aviation safety: Crew Resource Management, the Line Operations Safety Audit, and Threat and Error Management
- Garvin, D. A., Edmondson, A. C., & Gino, F. (2008). Is yours a learning organization, PDF - a solid methodology for assessing the depth of learning within your organization, with links to self-assessments
- Aven, T., Renn, O., & Rosa, E. A. (2011). On the ontological status of the concept of risk, PDF - Aven’s career has been to establish safety as a science; this article strives to define what risk actually is
- Hollnagel, E., Wears, R. L., & Braithwaite, J. (2015). From Safety-I to Safety-II: a white paper, PDF - an evolution of Hollnagel’s concept of Resilience Engineering, making the case that safety should focus not just on accidents (when things go unexpectedly poorly), but the full range of outcomes
- Hollnagel, E. (2014). Is safety a subject for science?, PDF - an earlier paper by Hollnagel that introduces Safety-II by arguing that we can’t have a science based on the non-occurrence of events (accidents)
- Dekker, S. W. A. (2017). Rasmussen’s legacy and the long arm of rational choice, PDF - the paper explores the moral aspects behind our tendency to blame people for causing accidents, and how blame can be harmful
- Repenning, N. P., & Sterman, J. D. (2001). Nobody Ever Gets Credit for Fixing Problems that Never Happened: CREATING AND SUSTAINING PROCESS IMPROVEMENT, PDF - an analysis of a challenge that faces many risk programs: why process improvement programs fail and succeed
- Rae, A., Provan, D., Aboelssaad, H., & Alexander, R. (2020). A manifesto for Reality-based Safety Science, PDF - a call for development of theories that can be empirically tested and are useful to practitioners, including a list of commitments for future research
- Provan, D. J., Woods, D. D., Dekker, S. W. A., & Rae, A. J. (2020). Safety II professionals: How resilience engineering can transform safety practice, PDF - a proposal for changing safety programs to adopt principles of Safety-II (also applicable to information risk management)
Other Media
Videos, podcasts and other media.
- The Safety of Work - a now biweekly podcast where Drew Rae and David Provan, two safety practitioners and researchers, review academic research and discuss the implications for safety management
- The Key to High Performance: What the Data Says - Nicole Forsgren presents results of her research at DevOps Enterprise Summit San Francisco 2017
- 2019 Accelerate State of DevOps Report - the latest (and likely last) State of DevOps Report produced under the direction of Nicole Forsgren
- The STPA Handbook - a whitepaper written by Nancy Leveson and John Thomas to help practitioners learn to use STPA
Graduate Degree Programs
Three programs with graduates active in the IT Resilience Engineering community:
- Human Factors & System Safety at Lund University, Sweden - this is where John Allspaw (thesis) and others active in the learning from incidents community have pursued their degrees, including J Paul Reed (thesis) - 1 or 2 year program, with mandatory on-site learning labs.
- Cognitive Systems Engineering at The Ohio State University - David Woods is on faculty, and Laura Maguire completed her PhD here (talk based on her work)
- Managing Risk and System Change at Trinity College Dublin, Ireland - I am currently pursuing my MSc in Psychology here, and will post my thesis when it’s done! 2 year masters program, all online. A broader curriculum than Human Factors & Systems Safety, covering: human factors and sociotechnical systems safety, organizational change, safety risk assessment and risk management, design, organizational psychology and leadership, human resources, statistics, and research methodology.
Organizations
- Resilience Engineering Association: the official home of Resilience Engineering “Resilience Engineering is a trans-disciplinary perspective that focuses on developing theories and practices that enable the continuity of operations and societal activities to deliver essential services in the face of ever-growing dynamics and uncertainty. It addresses complexity, non-linearity, inter-dependencies, emergence, formal and informal social structures, threats and opportunities.”
- Society for Risk Analysis (SRA): “The Society for Risk Analysis is a multidisciplinary, interdisciplinary, scholarly, international society that provides an open forum for all those who are interested in risk analysis. Risk analysis is broadly defined to include risk assessment, risk characterization, risk communication, risk management, and policy relating to risk, in the context of risks of concern to individuals, to public- and private-sector organizations, and to society at a local, regional, national, or global level.”
- Society of Information Risk Analysts (SIRA): Data > Dogma “The Society of Information Risk Analysts (SIRA), established in 2011, is the go-to resource for decision makers & practitioners of information risk management. We endeavor to do this by supporting the collaborative efforts of our members through research, knowledge sharing, and member-driven education.”