Chaos & Resilience Engineering talk

15 Sep 2019 · @jabenninghoff

Update: by request, I’ve posted handouts for my Secure360 version of this talk here.

I’m giving a talk next Tuesday (9/24) at at the September OWASP MSP Meeting on “Chaos & Resilience Engineering”. Because the talk is told as a story and a demo, I won’t be posting copies of the slides, but I am including an abstract and a list of references here. The talk tells the story of my journey to find chaos engineering, introduces chaos engineering, describes how it is complemented by resilience engineering, and discusses how to get started and join the movement.

Note: I presented a version of this talk at an internal company conference in 2019, which led me to create the Chaos & Resilience Engineering Guild. Later on, I left security and moved to Infrastructure to start a Site Reliability Engineering practice.

Abstract

Chaos engineering started at Netflix in 2011 with the invention of the Chaos Monkey, a tool that intentionally disrupted systems on the production network to discover systemic weaknesses so that they could be removed. Since then, the Chaos Monkey has grown to become the Simian Army, and chaos engineering has spread to a global community that develops free & commercial tools to facilitate experiments in QA and production.

My journey to chaos & resilience engineering started in 2009 with my desire to find a better way, leading me to the world of safety science and to its connection to the work at Netflix, Etsy, and elsewhere. In this talk, I’ll explain chaos engineering, the prerequisites for doing it in production, and how it relates to resilience. I will share some of the work I’ve done in chaos engineering (in a small way) and resilience engineering (in a larger way), and also ask attendees to share their own experiences in chaos & resilience engineering - you might not or realize how easy it is to get started, or know that you’re already doing it!

My Journey to Chaos Engineering