Information Safety

Improving technology through lessons from safety.

Interested in applying lessons from safety to security? Learn more at security-differently.com!

CONOPS (Concept of Operations)

I recently came across a posting on Design Docs at Google. I was struck by the similarities between the design document, as described in the article, and a Concept of Operations (CONOPS). Traditionally, CONOPS are primarily used by the military, for very large and costly projects, such as the design of a new Coast Guard Cutter, created prior to official design documentation, and mainly serve to satisfy project requirements, and is not something you’d expect a modern software organization like Google to use.

In my own work, I’ve come to believe that a shared mental model of the application or service the team is building is essential for reliability and resilience, and there is research that suggests an agile CONOPS can help develop a shared mental model amongst stakeholders, by using visualization, models, and system thinking. My own brief experiment with CONOPS found that creating a visual diagram is most valuable, the formal CONOPS outline, defined in IEEE standard 1362, was less useful.

What’s interesting about the Google Design Doc is that it includes important elements of the CONOPS. The article identifies the following functions of the design document (emphasis mine):

  • Early identification of design issues when making changes is still cheap.
  • Achieving consensus around a design in the organization.
  • Ensuring consideration of cross-cutting concerns.
  • Scaling knowledge of senior engineers into the organization.
  • Form the basis of an organizational memory around design decisions.
  • Acts as a summary artifact in the technical portfolio of the software designer(s).

It’s notable that four of the six functions all relate to development of a shared mental model of the system being built - across the engineering organization, with security & privacy, senior engineers, and for posterity. Additionally, I argue that many of the features described would also be found in a good CONOPS: Goals and Non-Goals, visual diagrams, and existing constraints. Unsurprisingly, the post also recommends making the document only as long as needed, avoid creating an ‘implementation manual’, and iterate.

I’d agree with all of that, and would also suggest one additional lesson from well written CONOPS: adding operational scenarios, as included in the CPC CONOPS mentioned earlier, can be an effective tool for helping people understand what’s being proposed, and how the designers envision it being used. Having specific narratives helps ‘make it real’, and makes implicit assumptions more explicit.

Bottom Line: whether you call it a CONOPS or a design document, creating a high-level description of what you’re planning to build, without getting into the weeds, is an underutilized but effective way to build better software systems. Focus on visualization and creating a common mental model for the organization (including our future selves), iterate, and consider using scenarios to help build understanding.

comment

Failover Conf

Back in April, I attended Failover Conf, a virtual conference hosted by Gremlin. Overall I thought the conference was pretty good, but as with all conferences, the usefulness of the talks varied. The influence of safety thinking was clear, especially Resilience Engineering, which was explicitly covered in two talks (Amy Tobey and J Paul Reed).

The highlights for me were two talks on Site Reliability Engineering (SRE) by Jennifer Petoff on SRE training at Google, and by Danyel Fisher & Liz Fong-Jones, on implementing SRE at honeycomb.io. SRE is an interesting practice; it’s essentially “how Google implemented operations at scale,” making the conference an interesting blend of theory (Resilience Engineering) and practice (SRE).

The downside of the conference was the unusually high number of marketing emails participants received; I mean, I know it’s a free conference, but even Gremlin admitted there were too many. Thankfully, you can watch all the talks without registration here.

The conference also had a dedicated Slack for discussion during and after the talks, which was for me at least as interesting as the talks themselves. From the Slack discussion, I got recommendations on some additional academic reading on Resilience Engineering from J Paul Reed, which I am sharing here:

comment

Secure360 Handouts

Secure360 Update: I’ve been asked by a couple of people to share a version of my slides that better shows how my talk presented the ideas in my references post.

To answer that request, I’ve posted a low-res version of the slides with some of my talk notes here.

These notes will probably make more sense if you’ve seen the talk, which was recorded for conference attendees (but not currently publicly available).

Update: Since Secure360 was fully virtual due to COVID in 2020, a video of the talk is available here, and for some reason, also here.

comment