{"id":1344,"date":"2020-11-17T15:26:26","date_gmt":"2020-11-17T15:26:26","guid":{"rendered":"https:\/\/tomgeraghty.co.uk\/?p=1344"},"modified":"2023-06-19T08:46:15","modified_gmt":"2023-06-19T08:46:15","slug":"resilience-engineering-and-devops","status":"publish","type":"post","link":"https:\/\/tomgeraghty.co.uk\/index.php\/resilience-engineering-and-devops\/","title":{"rendered":"Resilience Engineering and DevOps &#8211; A Deeper Dive"},"content":{"rendered":"<p><em><strong>[This is a work in progress. If you spot an error, or would like to contribute, please <a href=\"https:\/\/tomgeraghty.co.uk\/index.php\/contact-me\/\">get in touch<\/a>]<\/strong><\/em><\/p>\n<p>The term &#8220;Resilience Engineering&#8221; is appearing more frequently in the DevOps domain, field of physical safety, and other industries, but there exists some argument about what it really means. That disagreement doesn&#8217;t seem to occur in those domains where Resilience Engineering has been more prevalent and applied for some time now, such as healthcare and aviation. Resilience Engineering is an academic field of study and practice in its own right. There is even a <a href=\"https:\/\/www.resilience-engineering-association.org\/\">Resilience Engineering Association<\/a>.<\/p>\n<p><strong>Resilience Engineering<\/strong> is a multidisciplinary field associated with safety science, complexity, human factors and associated domains that focuses on understanding how complex adaptive systems cope with, and learn from, surprise.<\/p>\n<p>It addresses human factors, ergonomics, complexity, non-linearity, inter-dependencies, emergence, formal and informal social structures, threats and opportunities. A common refrain in the field of resilience engineering is &#8220;<a href=\"https:\/\/dvddpl.github.io\/2021\/02\/25\/there-is-no-root-cause.html\">there is no root cause<\/a>&#8220;, and blaming incidents on &#8220;human error&#8221; is also known to be counterproductive, as <a href=\"https:\/\/sidneydekker.com\/books\/\">Sidney Dekker explains so eloquently in &#8220;The Field Guide To Understanding Human Error&#8221;<\/a>.<\/p>\n<p><span style=\"font-weight: 400;\">Resilience engineering is &#8220;<\/span><i><span style=\"font-weight: 400;\">The intrinsic ability of a system to adjust its functioning prior to, during, or following changes and disturbances, so that it can sustain required operations under both expected and unexpected conditions.<\/span><\/i><span style=\"font-weight: 400;\">&#8221; <a href=\"http:\/\/erikhollnagel.com\/ideas\/resilience-engineering.html\">Prof Erik Hollnagel<\/a><\/span><\/p>\n<p><b>It is the \u201csustained adaptive capacity\u201d of a system, organisation, or community.<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Resilience engineering has the word \u201cengineering\u201d in, which makes us typically think of machines, structures, or code, and this is maybe a little misleading. Instead, maybe try to think about engineering being the process of response, creation and change.<\/span><\/p>\n<h2>Systems<\/h2>\n<p><span style=\"font-weight: 400;\">Resilience Engineering also refers to \u201csystems\u201d, which might also lead you down a certain mental path of mechanical or digital systems. Widen your concept of systems from software and machines, to organisations, societies, ecosystems, even solar systems. They\u2019re all systems in the broader sense.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Resilience engineering refers in particular to complex systems, and typically, complex systems involve people. Human beings like you and I (I don&#8217;t wish to be presumptive but I&#8217;m assuming that you&#8217;re a human reading this).<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Consider <a href=\"https:\/\/www.cognitive-edge.com\/the-cynefin-framework\/\">Dave Snowden\u2019s Cynefin framework<\/a>:<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-1345 aligncenter\" src=\"https:\/\/tomgeraghtywordpress.s3-eu-west-1.amazonaws.com\/2020\/11\/Screenshot-2020-11-17-at-13.40.30.png\" alt=\"cynefin\" width=\"422\" height=\"441\" srcset=\"https:\/\/tomgeraghty.co.uk\/wp-content\/uploads\/2020\/11\/Screenshot-2020-11-17-at-13.40.30.png 422w, https:\/\/tomgeraghty.co.uk\/wp-content\/uploads\/2020\/11\/Screenshot-2020-11-17-at-13.40.30-287x300.png 287w\" sizes=\"auto, (max-width: 422px) 100vw, 422px\" \/><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Systems in an Obvious<\/strong> state are fairly easy to deal with. <strong>There are no unknowns<\/strong> &#8211; they\u2019re fixed and repeatable in nature, and the same process achieves the same result each time, so that we humans can use things like Standard Operating Procedures to work with them.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Systems in a Complicated<\/strong> state are large, usually too large for us humans to hold in our heads in their entirety, but are finite and have fixed rules. They possess<strong> known unknowns<\/strong> &#8211; by which we mean that you can find the answer if you know where to look. A modern motorcar, or a game of chess, are complicated &#8211; but possess fixed rules that do not change. With expertise and good practice, such as employed by surgeons or engineers or chess players, we can work with systems in complicated states.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Systems in a Complex<\/strong> state possess <strong>unknown-unknowns<\/strong>, and include realms such as battlefields, ecosystems, organisations and teams, or humans themselves. The practice in complex systems is probe, sense, and respond. Complexity resists reductionist attempts at determining cause and effect because the rules are not fixed, therefore the effects of changes can themselves change over time, and even the attempt of measuring or sensing in a complex system can affect the system. When working with complex states, feedback loops that facilitate continuous learning about the changing system are crucial.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Systems in a Chaotic<\/strong> state are <strong>impossible to predict<\/strong>. Examples include emergency departments or crisis situations. There are no real rules to speak of, even ones that change. In these cases, acting first is necessary. Communication is rapid, and top-down or broadcast, because there is no time, or indeed any use, for debate.<\/span><\/p>\n<h2>Resilience<\/h2>\n<p><span style=\"font-weight: 400;\">As Erik Hollnagel has said repeatedly since Resilience Engineering began (Hollnagel &amp; Woods, 2006), <a href=\"https:\/\/books.google.co.uk\/books?id=rygf6axAH7UC\">resilience is about what a system can do \u2014 including its capacity:\u00a0<\/a><\/span><\/p>\n<ul>\n<li><span style=\"font-weight: 400;\"><strong> to anticipate<\/strong> \u2014 seeing developing signs of trouble ahead to begin to adapt early and reduce the risk of decompensation\u00a0<\/span><\/li>\n<li><span style=\"font-weight: 400;\"><strong> to synchronize<\/strong> \u2014\u00a0 adjusting how different roles at different levels coordinate their activities to keep pace with tempo of events and reduce the risk of working at cross purposes\u00a0<\/span><\/li>\n<li><span style=\"font-weight: 400;\"><strong> to be ready to respond<\/strong> \u2014 developing deployable and mobilizable response capabilities in advance of surprises and reduce the risk of brittleness\u00a0<\/span><\/li>\n<li><span style=\"font-weight: 400;\"><strong> for proactive learning<\/strong> \u2014 learning about brittleness and sources of resilient performance before major collapses or accidents occur by studying how surprises are caught and resolved\u00a0<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">(From <a href=\"https:\/\/www.researchgate.net\/publication\/329035477_Resilience_is_a_Verb\"><em>Resilience is a Verb<\/em> by David D. Woods<\/a>)<\/span><\/p>\n<p>&nbsp;<\/p>\n<table>\n<tbody>\n<tr>\n<td><b>Capacity<\/b><\/td>\n<td><b>Description<\/b><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Anticipation<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Create foresight about future operating conditions, revise models of risk<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Readiness to respond<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Maintain deployable reserve resources available to keep pace with demand<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Synchronization<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Coordinate information flows and actions across the networked system<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Proactive learning<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Search for brittleness, gaps in understanding, trade-offs, re-prioritisations<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><span style=\"font-weight: 400;\">Provan et al (2020) build upon Hollnagel&#8217;s four aspects of resilience to show that resilient people and organisations must possess a \u201c<\/span><i><span style=\"font-weight: 400;\">Readiness to respond<\/span><\/i><span style=\"font-weight: 400;\">\u201d, and states &#8220;This requires employees to have the psychological safety to apply their judgement without fear of repercussion.&#8221;<\/span><\/p>\n<h2><span style=\"font-weight: 400;\">Resilience is therefore something that a system \u201c<\/span><b>does<\/b><span style=\"font-weight: 400;\">\u201d, not \u201c<\/span><b>has<\/b><span style=\"font-weight: 400;\">\u201d.<\/span><\/h2>\n<p>Systems comprise of structures, technology, rules, inputs and outputs, and most importantly, <strong>people<\/strong>.<\/p>\n<p><span style=\"font-weight: 400;\">\u201c<em>Resilience is about the creation and sustaining of various conditions that enable systems to adapt to unforeseen events. *People* are the adaptable element of those systems<\/em>\u201d &#8211; John Allspaw (<a href=\"https:\/\/twitter.com\/allspaw\">@allspaw<\/a>) of <a href=\"https:\/\/www.adaptivecapacitylabs.com\/\">Adaptive Capacity Labs.<\/a><\/span><\/p>\n<p><span style=\"font-weight: 400;\">Resilience therefore is about &#8220;systems&#8221; adapting to unforeseen events, and the adaptability of people is fundamental to resilience engineering.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">And if resilience is the potential to anticipate, respond, learn, and change, and people are part of the systems we\u2019re talking about:<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We need to talk about <\/span><b>people<\/b><span style=\"font-weight: 400;\">: What makes <\/span><b>people<\/b><span style=\"font-weight: 400;\"> resilient?<\/span><\/p>\n<h2>Psychological safety<\/h2>\n<p><span style=\"font-weight: 400;\">Psychological safety is the key fundamental aspect of groups of people (whether that group is a team, organisation, community, or nation) that facilitates performance. It is the belief, within a group,<i>\u00a0&#8220;that one will not be punished or humiliated for speaking up with ideas, questions, concerns, or mistakes<\/i>.&#8221; &#8211; <a href=\"https:\/\/journals.sagepub.com\/doi\/abs\/10.2307\/2666999\">Edmondson, 1999<\/a>.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Amy Edmondson also talks about the concept of a \u201cLearning organisation\u201d &#8211; essentially a complex system operating in a vastly more complex, even chaotic wider environment. In a learning organisation, <em>employees continually create, acquire, and transfer knowledge\u2014helping their company adapt to the un-predictable faster than rivals can<\/em>. (Garvin et al, 2008)<\/span><\/p>\n<p><span style=\"font-weight: 400;\">&#8220;A <\/span><i><span style=\"font-weight: 400;\">resilient<\/span><\/i><span style=\"font-weight: 400;\"> organisation adapts effectively to surprise.&#8221; (Lorin Hochstein, Netflix)<\/span><\/p>\n<p>https:\/\/twitter.com\/cyetain\/status\/1242926422869651458<\/p>\n<p>In this sense, we can see that a &#8220;learning organisation&#8221; and a &#8220;resilient organisation&#8221; are fundamentally the same.<\/p>\n<p>Learning, resilient organisations must possess <a href=\"https:\/\/psychsafety.co.uk\/psychological-safety-resilience-engineering\/\">psychological safety<\/a> in order to respond to changes and threats. They must also have clear goals, vision, and processes and structures. According to <a href=\"https:\/\/psychsafety.co.uk\/psychological-safety-conways-law\/\">Conways Law<\/a>:<\/p>\n<p>&#8220;Any organisation that designs a system (defined broadly) will produce a design whose structure is a copy of the organisation&#8217;s communication structure.&#8221;<\/p>\n<p>In order for both the organisation to respond quickly to change, and for the systems that organisation has built to respond to change, the organisation must be structured in such a way that response to change is as rapid as possible. In context, this will depend significantly on the organisation itself, but fundamentally, smaller, less-tightly coupled, autonomous and expert teams will be able to respond to change faster than large, tightly-bound teams with low autonomy will. <a href=\"https:\/\/teamtopologies.com\/\">Pais and Skelton&#8217;s Team Topologies<\/a> explores this in much more depth.<\/p>\n<h2>Engineer the conditions for resilience engineering<\/h2>\n<p><b>\u201cBefore you can engineer resilience, you must engineer the conditions in which it is possible to engineer resilience.\u201d <\/b>&#8211; Rein Henrichs (<a href=\"https:\/\/twitter.com\/ReinH\">@reinH<\/a>)<\/p>\n<p><span style=\"font-weight: 400;\">As we&#8217;ve seen, an essential component of learning organisations is psychological safety. <a href=\"https:\/\/psychsafety.co.uk\/psychological-safety-resilience-engineering\/\">Psychological safety is a necessary condition (though not sufficient) for the\u00a0 conditions of resilience to be created and sustained.\u00a0<\/a><\/span><\/p>\n<p><span style=\"font-weight: 400;\">Therefore we must <a href=\"https:\/\/www.psychsafety.co.uk\/create-psychological-safety-in-your-workplace\/\">create psychological safety in our teams, our organisations, our human &#8220;systems&#8221;<\/a>. Without this, we cannot engineer resilience.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We create, build, and maintain psychological safety via three core behaviours:<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Framing work as a <\/span><b>learning<\/b><span style=\"font-weight: 400;\"> problem, not an <\/span><b>execution<\/b><span style=\"font-weight: 400;\"> problem. The primary outcome should be knowing how to do it even better next time.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Acknowledging your own fallibility. You might be an expert, but you don\u2019t know everything, and you get things wrong &#8211; if you admit it when you do, you allow others to do the same.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Model curiosity &#8211; ask a lot of questions. This creates a need for voice. By you asking questions, people HAVE to speak up.\u00a0<\/span><\/li>\n<\/ol>\n<h2>Resilience engineering and psychological safety<\/h2>\n<p><span style=\"font-weight: 400;\">Psychological safety enables these fundamental aspects of resilience &#8211; the sustained adaptive capacity of a team or organisation.:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Taking risks and making changes that you don\u2019t, or can\u2019t, fully understand the outcomes of.\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Admitting when you made a mistake.\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Asking for help<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Contributing new ideas<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Detailed systemic cause* analysis (The ability to get detailed information about the \u201cmessy details\u201d of work)<\/span><\/li>\n<\/ul>\n<p>(*There is <a href=\"https:\/\/tomgeraghty.co.uk\/index.php\/rothmans-causal-pies-in-root-cause-analysis\/\">never a single root cause<\/a>)<\/p>\n<p><span style=\"font-weight: 400;\">Let\u2019s go back to that phrase at the start:<\/span><\/p>\n<h2><b><i>Sustained adaptive capacity.<\/i><\/b><\/h2>\n<p><span style=\"font-weight: 400;\">What we\u2019re trying to create is an organisation, a complex system, and sub systems (maybe including all that software we\u2019re building) that possesses a <\/span><b>capacity for sustained adaptation<\/b><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">With DevOps we build systems that respond to demand, scale up and down, we implement redundancy, low-dependancy to allow for graceful failure, and identify and react to security threats.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Pretty much all of these only contribute to <a href=\"https:\/\/github.com\/lorin\/resilience-engineering\/blob\/master\/intro.md\">robustness<\/a>.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1346\" src=\"https:\/\/tomgeraghtywordpress.s3-eu-west-1.amazonaws.com\/2020\/11\/Screenshot-2020-11-17-at-14.20.04.png\" alt=\"robustness vs resilience\" width=\"612\" height=\"323\" srcset=\"https:\/\/tomgeraghty.co.uk\/wp-content\/uploads\/2020\/11\/Screenshot-2020-11-17-at-14.20.04.png 612w, https:\/\/tomgeraghty.co.uk\/wp-content\/uploads\/2020\/11\/Screenshot-2020-11-17-at-14.20.04-300x158.png 300w\" sizes=\"auto, (max-width: 612px) 100vw, 612px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">(David Woods, Professor, Integrated Systems Engineering Faculty, Ohio State University)<\/span><\/p>\n<p><span style=\"font-weight: 400;\">You may want to think back to the cynefin model, and think of robustness as being able to deal well with <\/span><i><span style=\"font-weight: 400;\">known unknowns<\/span><\/i><span style=\"font-weight: 400;\"> (complicated systems), and resilience as being able to deal well with <\/span><i><span style=\"font-weight: 400;\">unknown unknowns<\/span><\/i><span style=\"font-weight: 400;\"> (complex, even chaotic systems). Technological or DevOps practices that primarily focus on systems, such as microservices, containerisation, autoscaling, or distribution of components, build robustness, not resilience.<\/span><\/p>\n<p>However, if we are to build resilience, the sustained adaptive capacity for change, we can utilise DevOps practices for our benefit. None of them, like psychological safety, are sufficient on their own, but they are necessary. Using automation to reduce the cognitive load of people is important: by reducing the extraneous cognitive load, we maximise the germane, problem solving capability of people. The provision of other tools, internal platforms, automated testing pipelines, and increasing the observability of systems increases the ability of people and teams to respond to change, and increases their <strong>sustained adaptive capacity<\/strong>.<\/p>\n<p>If brittleness is the opposite of resilience, what does &#8220;good&#8221; resilience look like? The word &#8220;anti-fragility&#8221; appears to crop up fairly often, due to the book &#8220;<a href=\"https:\/\/amzn.to\/3DqyWDx\">Antifragile: Things that Gain from Disorder<\/a>&#8221; by\u00a0Nassim Taleb. What\u00a0Taleb describes as antifragile, ultimately, is resilience itself.<\/p>\n<p>I have my own views on this, but fundamentally I think this is the danger of academia (as in the field of resilience engineering) restricting access to knowledge. A lot of resilience engineering literature is held behind academic paywalls and journals, which most practitioners do not have access to.\u00a0\u00a0<strong>It should be of no huge surprise that people may reject a body of knowledge if they have no access to it.<\/strong><\/p>\n<h2>Observability<\/h2>\n<p>It is absolutely crucial to be able to observe what is happening inside the systems. This refers to anything from analysing system logs to identify errors or future problems, to managing Work In Progress (WIP) to highlight bottlenecks in a process.<\/p>\n<p>Too often, engineering and technology organisations look only inward, whilst many of the threats to systems are external to the system and the organisation. Observability must also concern external metrics and qualitative data: what is happening in the marketspace, the economy, and what are our competitors doing?<\/p>\n<h2><span style=\"font-weight: 400;\">Resilience Engineering and DevOps<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">What must we do?<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Create psychological safety<\/strong> &#8211; this means that people can ask for help, raise issues, highlight potential risks and &#8220;apply their judgement without fear of repercussion.&#8221; There&#8217;s a great piece here on <a href=\"https:\/\/psychsafety.co.uk\/psychological-safety-resilience-engineering\/\">psychological safety and resilience engineering<\/a>.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Manage cognitive load<\/strong> &#8211; so people can focus on the real problems of value &#8211; such as responding to unanticipated events.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Apply DevOps practices to technology <\/strong>&#8211; use automation, internal platforms and observability, amongst other DevOps practices.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Increase observability and monitoring\u00a0<\/strong>&#8211; this applies to systems (internal) and the world (external). People and systems cannot respond to a threat if they don&#8217;t see it coming.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Embed practices and expertise in component causal analysis\u00a0<\/strong>&#8211; whether you call it a post-mortem, retrospective or debrief, build the habits and expertise to routinely examine the systemic component causes of failure. <a href=\"https:\/\/tomgeraghty.co.uk\/index.php\/rothmans-causal-pies-in-root-cause-analysis\/\">Try using Rothmans Causal Pies in your next incident review.<\/a><\/span><\/p>\n<p><strong>Run &#8220;fire drills&#8221; and disaster exercises.<\/strong> Make it easier for humans to deal with emergencies and unexpected events by making it habit. Increase the cognitive load available for problem solving in emergencies.<\/p>\n<p><span style=\"font-weight: 400;\"><strong>Structure the organisation in a way that facilitates adaptation<\/strong> and change. Consider appropriate team topologies to facilitate adaptability.<\/span><\/p>\n<h2>In summary<\/h2>\n<p>Through facilitating learning, responding, monitoring, and anticipating threats, we can create resilient organisations. DevOps and <a href=\"https:\/\/www.psychsafety.co.uk\/about-psychological-safety\/\">psychological safety<\/a>\u00a0are two important components of resilience engineering, and resilience engineering (in my opinion) is soon going to be seen as a core aspect of <a href=\"https:\/\/tomgeraghty.co.uk\/index.php\/digital-transformation-and-enterprise-agility\/\">organisational (and digital) transformation<\/a>.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>References:<\/strong><\/p>\n<p>Conway, M. E. (1968) How Do Committees Invent?\u00a0Datamation magazine.\u00a0F. D. Thompson Publications, Inc. Available at: https:\/\/www.melconway.com\/Home\/Committees_Paper.html<\/p>\n<p>Dekker, S.\u00a02006. The Field Guide to Understanding Human Error. Ashgate Publishing Company, USA.<\/p>\n<p>Edmondson, A., 1999. Psychological safety and learning behavior in work teams. Administrative science quarterly, 44(2), pp.350-383.<\/p>\n<p>Garvin, David &amp; Edmondson, Amy &amp; Gino, Francesca. (2008). Is Yours a Learning Organization?. Harvard business review. 86. 109-16, 134.<\/p>\n<p>Hochstein, L. (2019) \u00a0Resilience engineering: Where do I start? Available at: https:\/\/github.com\/lorin\/resilience-engineering\/blob\/master\/intro.md (Accessed: 17 November 2020).<\/p>\n<p><span class=\"textheading1 mobile-undersized-upper\">Hollnagel, E., Woods, D. D. &amp; Leveson, N. C. (2006). Resilience engineering: Concepts and precepts. Aldershot, UK: Ashgate.<\/span><\/p>\n<p class=\"referenceString selectable\"><span class=\"textheading1 mobile-undersized-upper\">Hollnagel, E.\u00a0<\/span><i>Resilience Engineering<\/i>\u00a0(2020). Available at: https:\/\/erikhollnagel.com\/ideas\/resilience-engineering.html (Accessed: 17 November 2020).<\/p>\n<p><span style=\"font-weight: 400;\">Provan, D.J., Woods, D.D., Dekker, S.W. and Rae, A.J., 2020. Safety II professionals: how resilience engineering can transform safety practice. <\/span><i><span style=\"font-weight: 400;\">Reliability Engineering &amp; System Safety<\/span><\/i><span style=\"font-weight: 400;\">, <\/span><i><span style=\"font-weight: 400;\">195<\/span><\/i><span style=\"font-weight: 400;\">, p.106740. Available at <\/span><a href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0951832018309864\"><span style=\"font-weight: 400;\">https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0951832018309864<\/span><\/a><\/p>\n<p>Woods, D. D. (2018). Resilience is a verb. In Trump, B. D., Florin, M.-V., &amp; Linkov, I.<br \/>\n(Eds.). IRGC resource guide on resilience (vol. 2): Domains of resilience for complex interconnected\u00a0systems. Lausanne, CH: EPFL International Risk Governance Center. Available on irgc.epfl.ch and irgc.org.<\/p>\n<p>John Allspaw has collated an <a href=\"https:\/\/www.goodreads.com\/review\/list\/214458-john?order=a&amp;shelf=resilience-engineering&amp;sort=date_added\">excellent book list for essential reading on resilience engineering here<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>[This is a work in progress. If you spot an error, or would like to contribute, please get in touch] The term &#8220;Resilience Engineering&#8221; is appearing more frequently in the DevOps domain, field of physical safety, and other industries, but there exists some argument about what it really means. That disagreement doesn&#8217;t seem to occur &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/tomgeraghty.co.uk\/index.php\/resilience-engineering-and-devops\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Resilience Engineering and DevOps &#8211; A Deeper Dive&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":1346,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1,4],"tags":[102,116,122,77],"class_list":["post-1344","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog","category-tech","tag-devops","tag-psychological-safety","tag-resilience-engineering","tag-technology"],"_links":{"self":[{"href":"https:\/\/tomgeraghty.co.uk\/index.php\/wp-json\/wp\/v2\/posts\/1344","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/tomgeraghty.co.uk\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tomgeraghty.co.uk\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tomgeraghty.co.uk\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/tomgeraghty.co.uk\/index.php\/wp-json\/wp\/v2\/comments?post=1344"}],"version-history":[{"count":29,"href":"https:\/\/tomgeraghty.co.uk\/index.php\/wp-json\/wp\/v2\/posts\/1344\/revisions"}],"predecessor-version":[{"id":2108,"href":"https:\/\/tomgeraghty.co.uk\/index.php\/wp-json\/wp\/v2\/posts\/1344\/revisions\/2108"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/tomgeraghty.co.uk\/index.php\/wp-json\/wp\/v2\/media\/1346"}],"wp:attachment":[{"href":"https:\/\/tomgeraghty.co.uk\/index.php\/wp-json\/wp\/v2\/media?parent=1344"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tomgeraghty.co.uk\/index.php\/wp-json\/wp\/v2\/categories?post=1344"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tomgeraghty.co.uk\/index.php\/wp-json\/wp\/v2\/tags?post=1344"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}