{"id":1032,"date":"2020-05-20T09:28:21","date_gmt":"2020-05-20T09:28:21","guid":{"rendered":"https:\/\/tomgeraghty.co.uk\/?p=1032"},"modified":"2024-04-05T08:50:07","modified_gmt":"2024-04-05T08:50:07","slug":"resilience-engineering-and-psychological-safety","status":"publish","type":"post","link":"https:\/\/tomgeraghty.co.uk\/index.php\/resilience-engineering-and-psychological-safety\/","title":{"rendered":"DevOps, Psychological Safety and Resilience Engineering"},"content":{"rendered":"<h1><span style=\"font-weight: 400;\">The Links Between Psychological Safety, Resilience Engineering and DevOps<\/span><\/h1>\n<p>Note: since writing this, I&#8217;ve learned a lot more about <a href=\"https:\/\/tomgeraghty.co.uk\/index.php\/resilience-engineering-and-devops\/\">resilience engineering and its relation to DevOps and psychological safety.\u00a0<\/a><\/p>\n<hr \/>\n<p><span style=\"font-weight: 400;\">Psychological safety is cited as the key factor in team performance by numerous studies including <\/span><a href=\"https:\/\/psychsafety.co.uk\/googles-project-aristotle\/\"><span style=\"font-weight: 400;\">Google\u2019s Project Aristotle<\/span><\/a><span style=\"font-weight: 400;\"> and the <\/span><a href=\"https:\/\/cloud.google.com\/blog\/products\/devops-sre\/the-2019-accelerate-state-of-devops-elite-performance-productivity-and-scaling\"><span style=\"font-weight: 400;\">DORA\/Google State Of DevOps Reports<\/span><\/a><span style=\"font-weight: 400;\">. The <\/span><a href=\"https:\/\/journals.sagepub.com\/doihttps:\/\/journals.sagepub.com\/doi\/abs\/10.2307\/2666999\/abs\/10.2307\/2666999\"><span style=\"font-weight: 400;\">evidence shows <\/span><\/a><span style=\"font-weight: 400;\">that teams that operate in psychologically safe environments where they can present their true selves at work, take risks, admit mistakes, and ask for support from their teammates, <a href=\"https:\/\/www.psychsafety.co.uk\/create-psychological-safety-in-your-workplace\/\">significantly outperform other organisations<\/a>.\u00a0<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-1036\" src=\"https:\/\/tomgeraghtywordpress.s3-eu-west-1.amazonaws.com\/2019\/12\/psychological-safety-google-1-264x300.png\" alt=\"\" width=\"264\" height=\"300\" srcset=\"https:\/\/tomgeraghty.co.uk\/wp-content\/uploads\/2019\/12\/psychological-safety-google-1-264x300.png 264w, https:\/\/tomgeraghty.co.uk\/wp-content\/uploads\/2019\/12\/psychological-safety-google-1.png 730w\" sizes=\"auto, (max-width: 264px) 100vw, 264px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">However, establishing psychological safety is rarely prioritised by delivery-focussed leaders who use output-oriented metrics. Instead, these leaders tend to focus on <\/span><a href=\"https:\/\/medium.com\/startup-tools\/okrs-5afdc298bc28\"><span style=\"font-weight: 400;\">objectives, metrics<\/span><\/a><span style=\"font-weight: 400;\">, and modern practices such as <\/span><a href=\"https:\/\/medium.com\/west-stringfellow\/building-product-teams-examples-from-amazon-google-apple-basecamp-and-fog-creek-d222c9bc4317\"><span style=\"font-weight: 400;\">value-stream alignment and cross-functional teams<\/span><\/a><span style=\"font-weight: 400;\">. While these have great value, and will go some way, or indeed a long way, to drive performance and delivery, they are not the full picture.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">It can be very challenging, particularly for less experienced leaders, or capable leaders in difficult circumstances, to build and facilitate <a href=\"https:\/\/www.psychsafety.co.uk\/about-psychological-safety\/\">psychologically safe environments<\/a>. This is particularly true in technologically-oriented organisations where the domain is complex and failure is explicit, obvious and can generate a large blast radius.\u00a0<\/span><\/p>\n<h2>Mistakes happen. They must happen.<\/h2>\n<p><span style=\"font-weight: 400;\">In a psychologically unsafe team, a software engineer who makes a mistake in a <a href=\"https:\/\/www.cognitive-edge.com\/the-cynefin-framework\/\">complex system<\/a> and releases a small flaw into production that later causes an outage may be blamed for the mistake. The flaw will be easily attributable, and the impact of the outage can be significant. In many organisations, the <\/span><a href=\"https:\/\/www.forbes.com\/sites\/steveblank\/2015\/03\/09\/fear-of-failure-and-lack-of-speed-in-a-large-corporation\/#1e869db65b74\"><span style=\"font-weight: 400;\">resultant fear of error can dramatically slow down the rate of change and speed to market<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Of course, the converse is not appealing either; it\u2019s not acceptable to tolerate errors, outages, and mistakes. Speed to market with a faulty product or service may be equally as bad as a significant delay to reach the market. <\/span><a href=\"https:\/\/www.bloomberg.com\/news\/photo-essays\/2013-01-17\/the-most-expensive-product-recalls\"><span style=\"font-weight: 400;\">Customers do not tolerate poor quality services<\/span><\/a><span style=\"font-weight: 400;\">, so we need to build high quality services and do it at velocity. This delivery at pace is one of the key tenets of DevOps, and an effective DevOps culture requires psychological safety.<\/span><\/p>\n<h2>Resilience Engineering and Psychological Safety<\/h2>\n<p><span style=\"font-weight: 400;\">In my work on <\/span><a href=\"https:\/\/www.computing.co.uk\/ctg\/news\/3082689\/psychological-safety-teams\"><span style=\"font-weight: 400;\">psychological safety in high performance teams<\/span><\/a><span style=\"font-weight: 400;\">, I\u2019m often asked about how to achieve it, and whilst there are many general approaches that overlap significantly with principles of excellence in servant (or empathetic) leadership, there are also specific actions and approaches that are suitable specifically for technology teams. Here, we\u2019re going to drill down into one of the key aspects of a DevOps approach: Resilience Engineering, and how psychological safety is a fundamental component of resilience.<\/span><\/p>\n<p><a href=\"http:\/\/erikhollnagel.com\/ideas\/resilience-engineering.html\"><span style=\"font-weight: 400;\">Resilience Engineering<\/span><\/a><span style=\"font-weight: 400;\"> is a field of study that emerged from cognitive system engineering in the early 2000s, largely in response to NASA events in 1999 and 2000, including the failure of the Mars Climate Orbiter. It is defined as &#8220;<\/span><i><span style=\"font-weight: 400;\">The intrinsic ability of a system to adjust its functioning prior to, during, or following changes and disturbances, so that it can sustain required operations under both expected and unexpected conditions.<\/span><\/i><span style=\"font-weight: 400;\">&#8221; Erik Hollnagel<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-1035\" src=\"https:\/\/tomgeraghtywordpress.s3-eu-west-1.amazonaws.com\/2019\/12\/mars-climate-orbiter-216x300.jpg\" alt=\"\" width=\"216\" height=\"300\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Resilience Engineering is the intentional engineering of a system (a sociotechnical system, such as a community,\u00a0 organisation, or nation) to anticipate, detect and respond to both external and internal changes, planned or unplanned, to the system itself and continue to operate whilst change occurs. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Very little theory within this domain has been generated that doesn\u2019t emerge from studies of real work; Resilience Engineering exists within high-stakes domains such as aviation, construction, surgery, military agencies and law enforcement and is becoming more visible in DevOps and Digital Transformation.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">There is a difference between robustness and resilience engineering,\u00a0 as described by David Woods, Professor, Integrated Systems Engineering Faculty, Ohio State University. Technological measures such as autoscaling, failovers, retries and queues, for example, only really contribute to <strong>robustness<\/strong>, not resilience:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Decoupling and reducing dependencies between components<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Utilising microservices and containerisation<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Autoscaling applications based on demand<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Creating self-healing applications and systems<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Using monitoring and visibility tools to facilitate responses to out-of-bounds events<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Adopting error budgeting instead of (or in addition to) uptime measures<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Using automated code testing, continuous integration and advanced deployment practices<\/span><\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-1346\" src=\"https:\/\/tomgeraghtywordpress.s3-eu-west-1.amazonaws.com\/2020\/11\/Screenshot-2020-11-17-at-14.20.04-300x158.png\" alt=\"robustness vs resilience\" width=\"300\" height=\"158\" srcset=\"https:\/\/tomgeraghty.co.uk\/wp-content\/uploads\/2020\/11\/Screenshot-2020-11-17-at-14.20.04-300x158.png 300w, https:\/\/tomgeraghty.co.uk\/wp-content\/uploads\/2020\/11\/Screenshot-2020-11-17-at-14.20.04.png 612w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">These approaches extend to concepts such as <\/span><a href=\"https:\/\/medium.com\/netflix-techblog\/the-netflix-simian-army-16e57fbab116\"><span style=\"font-weight: 400;\">chaos engineering<\/span><\/a><span style=\"font-weight: 400;\">. This is where flaws and interruptions are intentionally introduced in order to examine how the system behaves and help engineers identify how they can improve it.<\/span><\/p>\n<p>DevOps practices such as these help to build and improve psychological safety, through facilitating safe risk taking , and r<span style=\"font-weight: 400;\">esilience Engineering requires psychological safety to be present, because it is only psychological safety that enables people (the adaptable component of a system) to anticipate, respond and adapt to changes and challenges.<\/span><\/p>\n<h2>The formation of DevOps teams<\/h2>\n<p><span style=\"font-weight: 400;\">As a new DevOps-oriented team moves through <\/span><a href=\"https:\/\/psychsafety.co.uk\/psychological-safety-88-tuckmans-model\/\"><span style=\"font-weight: 400;\">Tuckman\u2019s Forming-Storming-Norming-Performing cycle<\/span><\/a><span style=\"font-weight: 400;\">, it relies more and more on cultures and practices that facilitate risk taking and admitting mistakes. If these practices are not embedded, the team will never be able to progress to the \u201cperforming\u201d stage, because high performance explicitly requires innovation, and therefore, risk taking. Without psychological safety, teams will cycle around the Storming and Norming phases as elements change in or around the team, such as people leaving or joining.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-1034\" src=\"https:\/\/tomgeraghtywordpress.s3-eu-west-1.amazonaws.com\/2019\/12\/Screenshot-2019-12-19-at-16.14.43-300x235.png\" alt=\"\" width=\"300\" height=\"235\" srcset=\"https:\/\/tomgeraghty.co.uk\/wp-content\/uploads\/2019\/12\/Screenshot-2019-12-19-at-16.14.43-300x235.png 300w, https:\/\/tomgeraghty.co.uk\/wp-content\/uploads\/2019\/12\/Screenshot-2019-12-19-at-16.14.43-768x602.png 768w, https:\/\/tomgeraghty.co.uk\/wp-content\/uploads\/2019\/12\/Screenshot-2019-12-19-at-16.14.43.png 880w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">It is only once an engineering team reaches the high performing stage that they can truly deliver high quality services at velocity. By utilising resilience engineering principles and DevOps practices, engineers are supported to take risks, experiment, deploy changes and recover quickly. They can feel comfortable in the knowledge that if something did go wrong, they\u2019d find out straight away, before customers start calling. These practices, far from being so-called \u201csoft\u201d skills, are <\/span><a href=\"https:\/\/medium.com\/ingeniouslysimple\/learning-from-the-accelerate-four-key-metrics-91725675e30a\"><span style=\"font-weight: 400;\">measurable by solid metrics that describe velocity whilst maintaining reliability, such as Change Rate, Mean Time Between Failures (MTBF) and Mean Time To Restore (MTTR).<\/span><\/a><\/p>\n<h2>Engineering Team Topologies<\/h2>\n<p><span style=\"font-weight: 400;\">Resilience Engineering echoes many capabilities with the concept of Site Reliability Engineering (SRE), introduced by <\/span><a href=\"https:\/\/landing.google.com\/sre\/interview\/ben-treynor\/\"><span style=\"font-weight: 400;\">Ben Traynor\u2019s team at Google in 2004.<\/span><\/a><span style=\"font-weight: 400;\"> SRE practices and capabilities may be implemented by an expert, dedicated, shared SRE team, or it may suit your organisation to embed an SRE function into each stream-aligned (SA) team if the products and systems are large enough to justify it. <\/span><a href=\"https:\/\/techbeacon.com\/enterprise-it\/sre-practice-5-insights-googles-experience\"><span style=\"font-weight: 400;\">Alternatively, it may be feasible to empower software engineers themselves to carry out SRE responsibilities if your systems are small enough<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-1033\" src=\"https:\/\/tomgeraghtywordpress.s3-eu-west-1.amazonaws.com\/2019\/12\/Screenshot-2019-12-19-at-16.14.56-300x168.png\" alt=\"\" width=\"300\" height=\"168\" srcset=\"https:\/\/tomgeraghty.co.uk\/wp-content\/uploads\/2019\/12\/Screenshot-2019-12-19-at-16.14.56-300x168.png 300w, https:\/\/tomgeraghty.co.uk\/wp-content\/uploads\/2019\/12\/Screenshot-2019-12-19-at-16.14.56-768x430.png 768w, https:\/\/tomgeraghty.co.uk\/wp-content\/uploads\/2019\/12\/Screenshot-2019-12-19-at-16.14.56.png 890w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">In addition to expert leadership practice, well organised teams, adopted shared values, <\/span><a href=\"https:\/\/codeascraft.com\/2012\/05\/22\/blameless-postmortems\/\"><span style=\"font-weight: 400;\">systemic root causes being diagnosed<\/span><\/a><span style=\"font-weight: 400;\"> in retrospectives, and an embrace of continuous improvement, we must adopt capabilities that empower team members to carry out their roles without fear of failure. In a technology team, those capabilities are the very same ones that enable high velocity change, security, and reliability.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">For more information regarding technology team organisation, Matthew Skelton and Manuel Pais explore in great depth how software teams can be organised to deliver most value, safety, and performance in their book \u201c<\/span><a href=\"https:\/\/teamtopologies.com\/\"><span style=\"font-weight: 400;\">Team Topologies<\/span><\/a><span style=\"font-weight: 400;\">\u201d, where they examine how the concepts of <\/span><a href=\"https:\/\/psychsafety.co.uk\/psychological-safety-conways-law\/\"><span style=\"font-weight: 400;\">Conways Law<\/span><\/a><span style=\"font-weight: 400;\">, Cognitive Load and Organisational Evolution converge.\u00a0<\/span><\/p>\n<h2>Resilience engineering is about entire organisations, not just technology.<\/h2>\n<p><span style=\"font-weight: 400;\">Whatever team you\u2019re on, or whatever team you lead, considering resilience engineering principles will improve delivery, safety, happiness, and performance. This enables people to work without fear, psychologically safe in the knowledge that <\/span><a href=\"https:\/\/blog.deming.org\/2012\/11\/inspection-is-too-late-the-quality-good-or-bad-is-already-in-the-product\/\"><span style=\"font-weight: 400;\">errors do not flow downstream<\/span><\/a><span style=\"font-weight: 400;\">. This place is where true high performance, speed to market, quality and innovation happens.<\/span><\/p>\n<p><a href=\"https:\/\/www.psychsafety.co.uk\/psychological-safety-and-agile-teams\/\">Psychological safety is also a core component of Agile delivery teams<\/a>, as it fundamentally enables truthful communication, response to change, and the ability to make mistakes and innovate.<\/p>\n<h2>Build your own high performing teams with psychological safety<\/h2>\n<p><span style=\"font-weight: 400;\">For more information about high performance teams, psychological safety, DevOps, or any of the other concepts covered in this article, <\/span><a href=\"https:\/\/tomgeraghty.co.uk\/index.php\/contact-me\/\"><span style=\"font-weight: 400;\">get in touch<\/span><\/a><span style=\"font-weight: 400;\">. I\u2019m always <\/span><span style=\"font-weight: 400;\"> for collaboration, speaking at events, podcasts, or other ways to get involved and help teams become more productive, safer, and most importantly, happier.<\/span><\/p>\n<p>Download a complete<a href=\"https:\/\/tomgeraghty.co.uk\/index.php\/psychological-safety-action-pack\/\"> Psychological Safety Action Pack<\/a> full of workshops, tools, resources, and posters to help you measure, build, and maintain Psychological Safety in your teams.<\/p>\n<p><span style=\"font-weight: 400;\">@tom_geraghty<\/span><\/p>\n<p><a href=\"mailto:tom@tomgeraghty.co.uk\"><span style=\"font-weight: 400;\">tom@tomgeraghty.co.uk<\/span><\/a><span style=\"font-weight: 400;\">\u00a0<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Links Between Psychological Safety, Resilience Engineering and DevOps Note: since writing this, I&#8217;ve learned a lot more about resilience engineering and its relation to DevOps and psychological safety.\u00a0 Psychological safety is cited as the key factor in team performance by numerous studies including Google\u2019s Project Aristotle and the DORA\/Google State Of DevOps Reports. The &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/tomgeraghty.co.uk\/index.php\/resilience-engineering-and-psychological-safety\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;DevOps, Psychological Safety and Resilience Engineering&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1,118,4],"tags":[102,116,122,77],"class_list":["post-1032","post","type-post","status-publish","format-standard","hentry","category-blog","category-psychological-safety","category-tech","tag-devops","tag-psychological-safety","tag-resilience-engineering","tag-technology"],"_links":{"self":[{"href":"https:\/\/tomgeraghty.co.uk\/index.php\/wp-json\/wp\/v2\/posts\/1032","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/tomgeraghty.co.uk\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tomgeraghty.co.uk\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tomgeraghty.co.uk\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/tomgeraghty.co.uk\/index.php\/wp-json\/wp\/v2\/comments?post=1032"}],"version-history":[{"count":17,"href":"https:\/\/tomgeraghty.co.uk\/index.php\/wp-json\/wp\/v2\/posts\/1032\/revisions"}],"predecessor-version":[{"id":2197,"href":"https:\/\/tomgeraghty.co.uk\/index.php\/wp-json\/wp\/v2\/posts\/1032\/revisions\/2197"}],"wp:attachment":[{"href":"https:\/\/tomgeraghty.co.uk\/index.php\/wp-json\/wp\/v2\/media?parent=1032"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tomgeraghty.co.uk\/index.php\/wp-json\/wp\/v2\/categories?post=1032"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tomgeraghty.co.uk\/index.php\/wp-json\/wp\/v2\/tags?post=1032"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}