“Root” Cause Analysis using Rothmans Causal Pies

rothmans causal pies

Context: It sometimes seems to me that in the tech industry, maybe because we’re often playing with new technologies and innovating in our organisation, or even field, (when we’re not trying to pay down tech debt and keep legacy systems running), we’re sometimes guilty of not looking outside our sphere for better practices and new (or even old) ideas.

Rothman’s Causal Pies

Whilst studying for my Master’s degree in Global Health, I discovered the concept of “Rothman’s Causal Pies”.

The Epidemiological Triad

Epidemiology is the study of why and how diseases (including non-communicable diseases) occur. As a field, it encompasses the entire realm of human existence, from environmental and biological aspect to heuristics and even economics. It’s a real exercise into Systems Thinking, which is kinda why I love it.

In epidemiology, there is a concept known as the “Epidemiological Triad“, which describes the necessary relationship between vector, host, and environment. When all three are present, the disease can occur. Without one or more of those three factors, the disease cannot occur. It’s a very simplistic but useful model. As we know, all models are wrong, but some are useful.

This concept is useful because through understanding this triad, it’s possible to identify an intervention to reduce the incidence of, or even eradicate, a disease, such as by changing something in the environment (say, by providing clean drinking water) or a vaccination programme (changing something about the host).

What the triad doesn’t provide, however, is a description of the various factors necessary for the disease to occur, and this is especially relevant to non-communicable diseases (NCDs), such as back pain, coronary heart disease, or a mental health problem. In these cases, there may be many different components, or causal factors. Some of these may be “necessary”, whilst some may contribute. There may be many difference combinations of causes that result in the disease.

To use heart disease as an example, the component causes, or “risk factors” could include poor diet, little or no exercise, genetic predisposition, smoking, alcohol, and many more. No single component is sufficient to cause the disease, and one (genetic predisposition, for example) may be necessary in all cases.

Rothman, in 1976, came up with a model that demonstrates the multifactorial nature of causation.

Rothman’s Causal Pies

An individual factor that contributes to cause disease is shown as a piece of a pie, like the triangles in the game Trivial Pursuit. After all the pieces of a pie fall into place, the pie is complete, and disease occurs.

The individual factors are called component causes. The complete pie, which is termed a causal pathway, is called a sufficient cause. A disease may have more than one sufficient cause, with each sufficient cause being composed of several component causes that may or may not overlap. A component that appears in every single pie or pathway is called a necessary cause, because without it, disease does not occur. An example of this is the role that genetic factors play in haemophilia in humans – haemophilia will not occur without a specific gene defect, but the gene defect is not believed to be sufficient in isolation to cause the disease.

An example: Note in the image below that component cause A is a necessary cause because it appears in every pie. But this should not mean that it is the “root cause”, because it is not sufficient on its own.

Root Cause Analysis

I’m a huge proponent of holding regular retrospectives (for incidents, failures, successes, and simply at regular intervals), but it seems that in technology, particularly when we’re carrying out a Root Cause Analysis due to an incident, there’s a tendency to assume one single “root cause” – the smoking gun that caused the problem.

We may tend towards assuming that once we’ve found this necessary cause, we’re finished. And whilst that’s certainly a useful exercise, it’s important to recognise that there are other component causes and there may be more than one sufficient cause.

The Five Why’s model is a great example of this – it fails to probe into other component factors, and only looks for a single root cause. As any resilience engineer will tell you: There is no Single Root Cause.

The 5 whys takes the team down a single linear path, and will certainly find a root cause, but leaves the team blind to other potential component or sufficient causes – and even worse: it leads the team to believe that they’ve identified the problem. In the worst case scenario, a team may identify “human error” as a root cause, which could re-affirm a faulty, overly-simplistic world view and result in not only the wrong cause identified, but harm the team’s ability to carry out RCAs in the future.

Read more about the flaws in the “five whys” model in John Allspaw’s “Infinite Hows”. Allspaw has recently published another great piece about “root causes” in this blog article.

In reality, we’re dealing with complex, maybe even chaotic states, alongside human interactions. There exist multiple causal factors, some necessary for the “incident” to have occurred, and some simply component causes that together become sufficient – the completed pie!

Take Away: There is usually more than one causal pie.

An improved approach could be to use Ishikawa diagrams, but in my experience, particularly when dealing with complex systems, these diagrams very quickly become visibly cluttered and complex, which makes them hard to use. Additionally, because each “fish bone” is treated as a separate pathway, interrelationships between causes may not be identified.

Instead of a complex fishbone diagram, try identifying all the component causes, and visually complete (on a whiteboard for example) all the pies that could (or did) result in the outcome. You almost certainly won’t identify all of them, but that doesn’t matter very much.

If we adopt the Rothman’s causal pie model instead of approaches such as the 5 whys or Ishikawa, it provides us with an easy to use and easy to visualise tool that can model not only “what caused this incident”, but “what factors, if present, could cause this incident to occur again?“. 

In order to prevent the incident (the disease, in epidemiological terms), the key factor we’re looking for is the “necessary cause” – component A in the pies diagram. But we’re also looking for the other component causes.

Application: The prevention of future incidents.

Suppose we can’t easily solve component A – maybe it’s a third party system that’s outside our control – but we can control causal components B and C which occur in every causal pie. If we control for those instead, it’s clear that we don’t need to worry about component A anyway!

Next time you’re carrying out a Root Cause Analysis or retrospective, try using Rothman’s Causal Pies.

Addendum: “Post-Mortem” exercises.

Even though the term “post-mortem” is ubiquitously used in the technology industry as a descriptor for analysis into root causes, I don’t like it.

Firstly, in the vast majority of tech incidents, nobody died – post-mortem literally means “after death”. It implies that a Very Bad Thing happened, but if we’re trying to hold constructive, open exercises where everyone present possesses enough psychological safety in order to contribute honestly and without fear, we should phrase the exercise in less morbid terms. The incident has already happened – we should treat it as a learning opportunity, not a punitive sounding exercise.

Secondly, we should run these root cause analysis exercises for successes, not just for failures. You don’t learn the secrets of a great marriage by studying divorce. The term “post-mortem” isn’t particularly appropriate for studying the root causes of successes.

 

I should probably highlight something about Safety I vs Safety II approaches here. I’ll add that when I have time!

 

Westrum’s Organisational Cultural Typologies

“Culture Eats Strategy for Breakfast”

A statement famously (but erroneously) attributed to Peter Drucker, which essentially means that however much you work on your strategy, you ultimately cannot ignore the “people factor”. It is people that execute your strategy and it is through people that it will succeed or fail.

People Create Culture

The most important aspect for any organisation is people and how they interact. Not strategy, not processes, not operations, and not even finance. An organisation is built of relationships between people (plus some processes, and software) and people create culture. If strategy consists of the rules of the game, culture will determine how the game is played. Culture is how people behave and communicate.

Psychological safety is often an emergent property of great organisational culture, but that doesn’t mean you can’t explicitly and purposefully work towards it and state that one of your goals for the organisation is to possess a great degree of psychological safety. Indeed, the first step in an intelligent journey to build psychological safety is often stating your goal and asking for help in getting there.

Psychological Safety and Culture

I’ve previously written about how to measure psychological safety, but measuring culture can be more challenging. Following his work in 1991 on technologies and disasters, Dr. Ron Westrum wrote in 2003 about The Typologies of Organisational Cultures that reflect how information flows through an organisation. He wrote: “organisational culture bears a predictive relationship with safety and that particular kinds of organisational culture improve safety…” That is to say, because information flow is influential and indicative of other aspects of culture, it can be used to predict how organisations or parts of them will behave when problems arise.

Westrum was focussed on real-world safety measures in the realm of healthcare and aviation, but in our technology world we should strive to adopt the same diligent approach to safety for the sake not just of the products we build but for the humans on our teams as well.

Culture is the almost intangible aspect of an organisation that so often reflects the CEO’s personality or the stance of the board members. As Westrum states:

“Culture is shaped by the preoccupations of management.”

For example, if management, particularly senior management, are most concerned about exposure to risk, the organisational culture will reflect that, with processes and checks in place to ensure risk is reduced wherever possible; this usually results in a decreased focus on innovation, lower speed to market, and a low appetite for change.

In 2015, Jez Humble, Joanne Molesky, and Barry O’Reilly wrote the book “Lean Enterprise: How High Performance Organizations Innovate at Scale”, which highlighted how critical culture is to performance, and highlighted Westrum’s Typology model. “Instead of creating controls to compensate for pathological cultures, the solution is to create a culture in which people take responsibility for the consequences of their actions.

The 2016 state of DevOps Report also showed that Generative, performance-oriented cultures improve software delivery performance, alongside market share, productivity and profitability.

Westrum’s Typologies subsequently appeared in Nicole Forsgren’s book “Accelerate” in 2018, where she was able to show that generative cultures were associated with improved software delivery performance (the four Accelerate Metrics) and other organisational capabilities for learning.

Westrum’s Organisational Typologies

See the table below for Westrum’s organisational typology model of Pathological, Bureaucratic, or Generative (Westrum had previously used “calculative” but later decided that bureaucratic was better interpreted by people in organisations). Each column describes a broad cultural typology and six aspects of those cultures. It is clear from the table that the Generative culture that Westrum describes is a broadly psychologically safe culture where team members cooperate, share their fears, admit failure and continually improve.

Pathological Bureaucratic Generative
Power oriented Rule oriented Performance oriented
Low cooperation Modest cooperation High cooperation
Messengers “shot” Messengers neglected Messengers trained
Responsibilities shirked Narrow responsibilities Risks are shared
Bridging discouraged Bridging tolerated Bridging encouraged
Failure leads to scapegoating Failure leads to justice Failure leads to inquiry
Novelty crushed Novelty leads to problems Novelty implemented

The Westrum organisational typology model: How organizations process information ( Ron Westrum, “A typology of organisation culture),” BMJ Quality & Safety 13, no. 2 (2004), doi:10.1136/qshc.2003.009522.)

By surveying people across the organisation, you can establish the broad typology in which your organisational culture sits, and identify measures to improve. Ask respondents to rate their agreement on a 1-5 scale (1 being not at all, 5 being complete agreement) with the below statements:

  • On my team, information is actively sought.
  • On my team, failures are learning opportunities, and messengers of them are not punished.
  • On my team, responsibilities are shared.
  • On my team, cross-functional collaboration is encouraged and rewarded.
  • On my team, failure causes enquiry.
  • On my team, new ideas are welcomed.

These 6 statements are from Dr Nicole Forsgren’s research into high performing teams at DORA.

Each of these statements align with a row in the table above, so by collecting and analysing the average scores, you can quantitatively determine where your organisation resides in Westrum’s Typologies. Analyse the standard deviation of the scores to determine both the range of scores and the degree of statistical significance of the results.

Average these scores for your summative Westrum’s Typology score. Close to zero suggests your culture is towards “Pathological”, 2-3 suggests Bureaucratic, and 4-5 suggests a Generative culture:

The individual statement scores suggest areas for improvement. For example, if your score for statement 4 is particularly low, investigate and employ practices to improve collaboration between different functional teams, ask teams what challenges they face in communication and collaboration, and facilitate informal gatherings or events where people in different teams can get to know each other.

Intra-Organisational Psychological Safety

Ron Westrum describes a culture of “safety” in Generative organisations, and it’s easy to see how psychological safety is both increased in, and fundamental to, Generative cultures. Amy Edmondson, in 2008, described “Learning Organisations” in her paper “Is yours a learning organization?” and similarly suggested an assessment framework to measure how well a company learns and how adeptly it refines its strategies and processes.

There are many ways to improve the psychological safety of your team and your organisation, but sometimes as a leader, your influence may not extend very far outside of your team, and as a result, you may decide to build a high-performing, psychologically safe team within an environment of much lower psychological safety. This is admirable, and most likely the best course of action, but it is one of the most difficult places to put yourself as a leader.

Consider the “safety gradient” between your team boundary and the wider organisation. In a pathological or bureaucratic organisation, with varying degrees of toxic culture, that safety gradient is steep, and can be very hard to maintain as the strong leader of a high performing team. You may elect as your strategy to lead by example from within the organisation, and hope that your high-performing, psychologically safe team highlights good practice, and combined with a degree of evangelicalism and support, you can change the culture from “bottom-up”, not “top-down”.

This can work, and it will certainly be rewarding if you succeed, but a more effective strategy may be to build your effective team whilst lobbying, persuading and influencing senior management with hard data and a business case for psychological safety that demonstrates the competitive advantage that it can bring.

Create Psychological Safety

Take a look at my Psychological Safety Action pack, with a ready-made business case and background information, workshops and templates, to give yourself a shortcut to influencing and build a high performance, psychologically safe, and Generative organisational culture.

For any further information on how to build high performing organisations and teams, get in touch at tom@tomgeraghty.co.uk.

The History of DevOps

devops loop

[Updated June 2023]

DevOps may be one of the most hyped concepts in the tech industry in recent times. Yet what it actually consists of is the subject of much debate: some describe DevOps as a culture of process improvement, whilst others describe it in purely technological terms of automation and cloud technologies.

The Origins of DevOps

What few disagree on though are its origins. In the tech industry, it has long been accepted that technologists are either “devs”: those who “create”, or “ops”: those who “build and maintain”. Developers write code while engineers build the system and keep it running. Conflict frequently emerges between these two camps and their seemingly incongruent goals –  whereas development teams are motivated and measured by their high change frequency and scale (deploying features, fixes, and improvements), operations teams are judged by reliability and consistency, qualities which are often seen as an outcome of low change frequency and scale (though we shall see later how this isn’t necessarily true). This often results in an antagonistic relationship between the two teams.

DevOps is or at least originated as, the effort to reconcile this fracture and improve business performance.

…all ideas are second-hand, consciously and unconsciously drawn from a million outside sources.” Mark Twain

At a high level, the practice of DevOps focuses on culture, process, velocity, feedback loops, repeatability via automation, responsiveness to change, and continuous improvement. (Also often condensed to CALMS – Culture, Automation, Lean, Measurement, Safety). These practices have accelerated the web-scale revolution behind high-performance tech giants such as Google, Netflix, Amazon, and Facebook.

However, these concepts are not new. They have been used by industrialists, researchers, and technologists to improve the quality and efficiency of production since the dawn of the industrial revolution.

Industry and Scientific Management

In 1620 Francis Bacon codified what was to become the fundamental basis for empirical knowledge: the origin of the scientific method. Bacon’s method described the conception of a theory based upon observation, and the use of experiments to test the theory. 400 years on, we still use Bacon’s approach to create and test theories, monitor systems and check technological functionality.

In the past, the man has been first; in the future, the system must be first.” Frederick Taylor

Frederick Taylor, in the 1880’s, applied the scientific method to management and workflows to improve labour productivity. He was one of the first people to deem work itself worthy of systematic study, using the principles that Bacon derived 200 years before. Whilst Taylor’s views on what makes a “good” worker were somewhat disturbing – he defined the “best” worker as “so stupid and so phlegmatic that he more nearly resembles in his mental make-up the ox than any other type.” – Taylorism had a huge impact on productivity across the industrialised world.

Taylor summed up his efficiency strategies in the 1911 book “The Principles of Scientific Management.” This was voted the most influential management book of the twentieth century by Fellows of the Academy of Management in 2011. Without Taylor, it’s unlikely that Apple or Google would even exist as they do now.

20th Century Production

At the beginning of the 20th Century, most manufacturing utilised inefficient techniques – cars for instance were built the way you or I would go about the task, by assembling the all the parts in one place: craft production. However when demand for cars increased, it became clear that a form of linear, or mass production was needed. One of the most well known examples of the production line is the one adopted by Henry Ford in 1913 for the Ford Model T, which was based on Taylor’s principles. Through the use of time and motion studies, Ford refined his production line until he had reduced the production time for a car from over twelve hours to just 93 minutes. He also introduced to mainstream manufacturing the concept of repeatability and standardisation. In contrast to Taylor, however, Ford always maintained his belief in the importance of the skill and craftsmanship of the worker.

Without data, you’re just another person with an opinion.” William Edwards Deming

In the 1950s, William Edwards Deming, a statistician, physicist, and management consultant, began to apply statistical analysis to manufacturing. Deming found that prioritising quality over throughput would actually decrease costs and improve productivity. Whilst Taylorism and scientific management had boosted productivity, quality had suffered. Defects were sent down the line and built into finished products because workers were incentivised to ignore flaws in order to meet quotas.

He defined what is now known as the Deming Cycle: Plan – Do – Check – Act. This is similar to the software development lifecycle most of the technology industry use today. Deming championed continual analysis and improvement of processes – one of the key tenets of DevOps.

He saw effective quality assurance as an essential function of high-performing organisations, the key message of the third of his “Fourteen Points”; key principles of management for transforming business effectiveness:

  1. Constancy of purpose, with the aim to become competitive and stay in business, and to provide jobs.
  2. Adopt the new philosophy. Embrace change.
  3. Cease dependence on inspection to achieve quality. Build quality checks and feedback loops into the process.
  4. End the practice of awarding business on the basis of lowest bid. Build long term relationships with suppliers, and value loyalty and trust.
  5. Continuously improve processes, aim to improve quality and productivity, which in turn leads to cost reductions through less wastage and higher efficiencies.
  6. Institute training on the job and integrate development into employees’ roles.
  7. Institute leadership. Leadership should help people and machines do a better job, remove barriers to working effectively, identify improvements, and develop teams.
  8. Drive out fear. Fear paralyses people and teams. Transparent communication, motivation, respect and care for each other and each other’s work will contribute to this aim.
  9. Break down barriers between departments. Cross-functional teams can solve problems more easily and effectively than single-function teams or siloes.
  10. Eliminate slogans and exhortations for the workforce asking for zero defects.
  11. Defects (and quality) are a result of the system, not the individual.
  12. Eliminate targets or quotas. Substitute quantity for quality, and quantity will follow.
  13. Permit pride of workmanship. Eliminate management by objective or by numbers. Employees feel more satisfaction when they get a chance to execute their work well and professionally, rather than trying to meet a quota.
    Institute training and self-improvement. Encourage employees to study for themselves and to see their studies and training as a self-evident part of their jobs.
  14. The transformation is everyone’s job. Transformation happens only when everyone in the organisation works to accomplish it.

Deming’s System of Profound Knowledge is the culmination of his work and ties together his seminal theories on quality, management and leadership into four interrelated areas:

  1. Appreciation for a system,
  2. Knowledge of variation,
  3. Theory of knowledge
  4. Psychology

Each area corresponds to one or more of his fourteen points, and we can reflect on how these four areas correspond to fundamental DevOps tenets too.

Appreciation for a system means that as a leader, engineer, developer or tester, you ought to understand the system that you are looking to work within – and that thoroughly understanding that system endows you with far greater capacity to improve it. This is systems thinking, a concept which will be revisited throughout this book.

Knowledge of variation refers to two types of “cause” determined by Deming: “Common” and “Special”. Common causes are those anticipated by, or inherent to, a system. An example of this would be scaling; for example, you might know that a particular system generates logs at a rate of 500GB per day, and as a result you build functions into your system to deal with this growing demand for storage. This growth (the “cause”) is understandable and predictable, and thus you are able to implement measures to manage the variation. Deming’s second cause is “Special”, and refers to those aspects that are unknown or unpredictable, such as a change made that had unintended consequences, or a datacentre outage, or action by a malicious third party. Deming estimated that over 94% of quality issues (in his case, in manufacturing, but the same principle applies to modern software delivery) are catalysed by “common” causes, but human nature looks for the “special” cause: the one-off event, the human error, or bad actor at play. If someone accidentally shuts down a production server, Deming’s solution is not to fire the human (thereby removing the unpredictable, unknown element), but to build improvements into the system to prevent a human making that mistake again, or preventing that mistake affecting the system.

Deming’s theory of knowledge concentrates on the importance of understanding our own knowledge. How do we discern what is true from what is false? How do we identify our own innate biases, and how can we make ourselves less susceptible to confirmation bias? Deming goes back to Bacon’s scientific method with the Plan-Do-Check-Act cycle, reflecting the concept of creating a hypothesis and then testing those assumptions. People appear to learn more effectively when they make predictions. Making a prediction forces us to think ahead about the potential outcomes and also causes us to examine more deeply the system that we’re working in or on.

At around the same time, after studying consumer behaviour in supermarkets, the Toyota Motor Corporation began using Kanban (which means “signboard” in Japanese) to control and record work. Kanban boards have vertical columns with work packages in the form of cards to represent stages in a process. Each process is a “customer” of the preceding process to the left – that is, the work is “pulled” from left to right, rather than “pushed”. This concept reduces inventory pile-up, enabling a delivery system called just in time and minimising waste. It also aids the identification of bottlenecks in the process by highlighting Work In Progress (WIP). Kanban makes “work” visible. And making work visible is crucial to further improvement, because “you can’t manage what you don’t measure”.

Any improvements made anywhere besides the bottleneck are an illusion.” Eliyahu M. Goldratt

The above constitutes Goldratt’s Theory of Constraints. In his 1984 management novel “The Goal”, Eli Goldratt built on Deming’s ideas and codified Lean Production, a precursor of DevOps methodology. He described a failing manufacturing plant where Alex, the main character, is brought in to turn things around within three months. Through a series of telephone calls and meetings with an acquaintance called Jonah (another physicist, like Deming), Alex solves the organisation’s problems by utilising pull rather than push processes, reducing WIP, and employing the Theory of Constraints. “The Goal” itself, Goldratt demonstrates, is simply to make money for the business. Anything else, if it cannot be demonstrated to help make money, is likely to be vanity.

the goal

People and Process

By the 1980s, the modern manufacturing revolution was in full swing, however, its often reductionist approach to workers wasn’t helpful, and staff turnover was high. Among those to recognise this was Burrhus Frederic Skinner, a psychologist, author, inventor and the Edgar Pierce Professor of Psychology at Harvard University. In describing the nature of quality work and happiness, he said:

It’s the difference between a craftsman who makes a complete chair and a person on an assembly line who makes only the legs. The craftsman’s work is constantly reinforced by the process of seeing the chair take form, and finally of producing the finished chair. But the assembly-line worker sees only chair leg after chair leg — never the completed product.

This is a near-definitive example of “systems thinking”- another key tenet of DevOps.

Being able to see the end result of the process is key to improving quality in the individual stages – how can someone build the perfect component if they don’t understand in the final product? Systems thinking is a cultural practice, rather than a process or tool, and relies on believing in the capability of team members to make small but important decisions regarding their part in the process, and thus being more invested in the outcome.

Photo by <a href="https://unsplash.com/@lennykuhne?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Lenny Kuhne</a> on <a href="https://unsplash.com/photos/jHZ70nRk7Ns?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Unsplash</a>
Photo by Lenny Kuhne on Unsplash

Further developments in understanding of how to develop an aspirational working culture came once again from Toyota when in 2001 they defined their philosophy, values and manufacturing ideals in four key headlines, “The Toyota Way”. These were:

  1. Long-Term Philosophy – Base your management decisions on a long-term philosophy, even at the expense of short-term goals.
  2. The Right Process Will Produce the Right Results – Focus on pull processes, managing WIP, and making work visible.
  3. Add Value to the Organization by Developing Your People – Provide effective training, highlight team success over individual success, and challenge your partners and suppliers.
  4. Continuously Solving Root Problems Drives Organizational Learning – Continuously improve (in Japanese, kaizen), use the “5 whys” to get to the root cause of problems, standardise, decide slowly and act quickly, and encourage a knowledge sharing culture.

Everything in The Toyota Way and Lean Production aligns with, and indeed comprises part of the DevOps principles.

The Agile Manifesto

Also that year, at Snowbird resort in Utah, seventeen developers, frustrated with traditional heavyweight project management methodologies, came up with the Agile Manifesto. At the time, industry experts estimated that the time between a validated business need and an actual application in production was around three years. There was a real desire to find more lightweight ways to deliver value from technology, faster. The Agile manifesto is as follows:

Individuals and interactions over processes and tools

Working software over comprehensive documentation

Customer collaboration over contract negotiation

Responding to change over following a plan

Image result for agile manifestoThe Agile Manifesto gives a clear guide to what to prioritise. For example, whilst documentation is valuable, it is more important to the business that the software works. The most well-known element of Agile is possibly the fourth line: Responding to change over following a plan. Given how quickly customer requirements, finances, and technology can change, it is often unrealistic to believe that specifications created at the start of a project will remain 100% accurate and true throughout the lifetime of the project. Thus, responding to change is one of the ways that software teams can provide a competitive edge over teams that do not.

Whilst Agile methodology is not fundamentally part of DevOps, the two usually go hand-in-hand. In technology teams, one is certainly easier to achieve in the presence of the other.

The First DevOps “Role”

Shortly after the Agile manifesto was written, Google was undergoing rapid expansion. As one of the few web-scale tech businesses at the time, they experienced the unprecedented challenge of trying to rapidly introduce new features whilst maintaining a highly complex, always-on, massive scale platform. The Site Reliability Engineering (SRE) team, led by Ben Traynor, was their solution.

A Site Reliability Engineer (SRE) would typically spend up to half their time performing operations-related work such as troubleshooting system issues and performing maintenance. They would spend the other half of their time on development tasks such as new features, scaling challenges, or automation. An SRE is an example of one of the first true DevOps roles in technology.

DevOps Detractors

…the opportunities for gaining IT-based advantages are already dwindling… And as for IT-spurred industry transformations, most of the ones that are going to happen have likely already happened or are in the process of happening.” Nicholas Carr

It’s worth noting that not all in business recognised the potential of DevOps. In May 2003, Nicholas Carr published an article in the Harvard Business Review, titled “IT doesn’t matter.” In this now infamous piece, Carr defines IT as a commodity, in the same category as electricity or water. He suggests that being the first to utilise a particular technology provides only a small competitive advantage, since your competitor can purchase the same system or replicate the same technology, but you incur the lion’s share of the cost by doing it first. He stated:

The key to success, for the vast majority of companies, is no longer to seek advantage aggressively but to manage costs and risks meticulously. If, like many executives, you’ve begun to take a more defensive posture toward IT in the last two years, spending more frugally and thinking more pragmatically, you’re already on the right course. The challenge will be to maintain that discipline when the business cycle strengthens and the chorus of hype about IT’s strategic value rises anew.

Carr’s piece was taken very seriously at the time, and still is by many business leaders. Perhaps it is fortunate for organisations such as Salesforce and Google that they pursued technology as a competitive advantage, and disregarded Carr’s advice.

Improving IT

It is not unsurprising however that technology had such a poor reputation at the time, since research suggests that at least 80% of outages were (and potentially still are) self-inflicted. A book by Kevin Behr, Gene Kim and George Spafford, The Visible Ops Handbook (2004), described a methodology to improve operational IT. This methodology of “Visible Ops” is described in four stages:

  1. Stabilize Patient, Modify First Response – This first step controls risky changes and reduces MTTR (Mean Time To Resolution).
  2. Catch and Release, Find Fragile Artifacts – Here assets, configurations and services are inventoried in order to identify those with the lowest change success rates, highest MTTR and highest downtime costs.
  3. Establish Repeatable Build Library – This creates repeatable builds for critical services, to make it “cheaper to rebuild than to repair.
  4. Enable Continuous Improvement – This implements metrics to enable continuous improvement of processes.

To some degree, these four stages are evolutions of elements of The Toyota Way. They formed an embryonic codification of what was to become the principles of DevOps.

Over the next few years, the technology industry underwent a paradigm shift, where methods of working were analysed, and technology became far more fundamental to the success of organisations (possibly to the chagrin of Nicholas Carr).

#DevOps

In 2008, the term DevOps was used in the industry for the first time. There’s some confusion and misinformation regarding how this came about, but I spoke to Andrew Clay Shafer and Patrick Debois, both widely credited with creating the term “DevOps”, to get the full story…

Andrew clay Shafer DevOps Ghent 2009

In August 2008 at the Agile Conference in Toronto, software developer Andrew Clay Shafer posted notice of a discussion group session entitled “Agile Infrastructure.” Just one person, system administrator Patrick Debois attended. Debois had become frustrated by the now ubiquitous conflicts between developers and operations while working on a data centre migration for the Belgium government and was looking for solutions. Shafer actually skipped his own session because he didn’t think anyone was interested, but Debois later tracked him down for a chat in the hallway. Inspired by that hallway discussion, they formed an “Agile Systems Administration” Google Group”

Patrick Debois

In November the following year, 2009, Patrick organised the first DevOpsDays conference in Belgium, though it was Shafer who (it’s believed) coined the term DevOps by tweeting using the #DevOps hashtag at the Velocity conference in June 2009 whilst watching the now famous “10 deploys a day” talk by John Allspaw and Paul Hammond of Flickr.

The Role of Cloud Technology

Image result for amazon cloud

It wasn’t long after the #DevOps hashtag was first used that adoption of cloud technology accelerated rapidly. The AWS EC2 service (virtual servers on-demand) only went out of beta in late 2008. It was (and still is) a fast evolving technology. Cloud technology tends to align well with DevOps practices, because its features lend themselves to elasticity and scaling, automation, measurement and repeatability, key fundamentals of DevOps.

The tide had turned. Increasingly organisations began looking at ways of improving software deployments, moving away from large, disruptive (and frankly, stressful) deployments, towards a model of more frequent, smaller, low-risk deployments.

Jez Humble and Dave Farley wrote what is still one of the definitive texts on this approach: “Continuous Delivery” in 2010. It describes in detail how to automate your build, deployment, and testing pipeline so that you can release changes in hours or even minutes. That might not seem that impressive today, but at the time, a release cycle of months or years was very common.

Continuous delivery, according to Farley and Humble, requires:

  • Comprehensive configuration management
  • Continuous integration and short lived branches (in reference to Trunk-Based Development)
  • Continuous testing

The automation of the build, deployment, and testing process, coupled with better collaboration between development, test, and ops teams, means that changes can be released rapidly. These smaller, low risk changes are more easily rolled back should something go wrong. “Continuous Delivery” showed how to increase velocity of change, whilst reducing risk and improving quality.

With cloud technology becoming mainstream and a desire to release software more rapidly, automation technology and tools took off. Software firms such as Puppet and Chef grew fast as developers and engineers strove to streamline their build processes and manage ever-increasing scales of infrastructure in the cloud. These tools also provided a new ability to fire up duplicate environments, such as staging, QA, test and validation, within minutes rather than weeks or months. Organisations exploiting these automation tools and using native cloud technologies felt that they were gaining significant competitive advantage by doing so, and what evidence there was, was in their favour. Even Gartner, in a 2011 report, stated that:

By 2015, DevOps will evolve from a niche strategy employed by large cloud providers into mainstream strategy employed by 20% of Global 2000 organisations.” Gartner, March 18, 2011.

In the same report, Gartner recognised that ITIL and other “top-down” best practice frameworks had not delivered on their goals, and IT organisations were looking for something new. They understood that because DevOps was primarily a cultural shift, driven from the ground up, it could prove far easier for technology departments to adopt than ITIL or similar frameworks

The Codification of DevOps

Two years later, Gene Kim, Kevin Behr and George Spafford wrote The Phoenix Project, a novel about a failing organisation struggling to meet the demands of modern technological complexity and competition. This novel inspired technology leaders and engineers alike, because it described with eerie familiarity what it was like to work in a technology organisation with poor change control, problematic “Ops vs Devs” cultures and inadequate visibility and monitoring of work or performance.

The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win: Gene Kim; Kevin Behr;...The Phoenix Project was inspired by The Goal by Eli Goldratt. It demonstrated a number of actionable ways to improve the performance of your IT organisation, such as effective (but lean) change control, effective (and again, lean, testing), reducing WIP and unplanned work, and avoiding letting anyone become the bottleneck for processes. The “bottleneck person” in the book is Brent, a character who knows everything but hasn’t documented anything. A key message of the book? Don’t be Brent.

Gene also introduces in the Phoenix Project one of the first efforts to codify DevOps, using “The Three Ways”:

  • Flow (or Systems Thinking)
  • Feedback Loops
  • Continuous Improvement

These “Three Ways” are concepts that echo the Toyota Way, Deming’s “Plan-Do-Check-Act” cycle, and other best practices, made specific to the DevOps context.  Gene’s subsequent book, written with Jez Humble (of “Continuous Delivery”), Patrick Debois and John Willis in 2016, “The DevOps Handbook”, goes deeper into the technical application of The Three Ways. It explores how to measure what matters to the business, and how to implement technical processes such as Continuous Integration and Continuous Delivery.

devops handbook

it didn’t take long to realise that there was another functional silo with a somewhat different set of interests than Dev or Ops: Security. Security, the IT profession realised, should be built into code as it is developed rather than added later on by a different team. Predictably, the idea became known as DevSecOps.

Gene also introduces the concept of “DevSecOps” – the integration of DevOps practices into the application of information security. If The Phoenix Project was the “why” to do DevOps, The DevOps Handbook provides the “how”.

Measuring DevOps

Given that DevOps is at least partly about effective measurement and continuous improvement, it’s self-evident that we, as an industry, should measure the success of DevOps itself. In 2012, Puppet began surveying people working in technology to understand the adoption and development of DevOps practices. They published “State of DevOps” reports which focussed on twenty key capabilities.These fall along familiar categories:

  • Technical (version control, test automation, deployment automation, trunk-based development)
  • Process (WIP limits, visual management, visualisation of the value stream)
  • Cultural (team culture, learning cultures, and job satisfaction).

Now taken over by DORA (DevOps Research and Assessment), an organisation created by Nicole Forsgren, Gene Kim, Jez Humble and Soo Choi, the State of Devops Report is being improved every year. According to Alanna Brown at Puppet, they “have built the deepest and most widely referenced body of DevOps research available, drawing on the experience of more than 30,000 technical professionals around the world.” The data from these reports demonstrates that Carr’s view of IT as a cost centre was misguided. It is clear that IT is a powerful driver of value to an organisation where velocity, security and stability are essential for success.

“…software delivery is an exercise in continuous improvement, and our research shows that year over year the best keep getting better, and those who fail to improve fall further and further behind.” Nicole Forsgren

On the back of the last four years of State of DevOps reports, Nicole Forsgren wrote the illuminating book “Accelerate”. It explains which metrics correlate to organisational performance, and what you should measure in order to find out where and how to improve.

Accelerate Nicole Forsgren

Forsgen states that the key metrics separating high from low performers in tech organisations are:

  • Deployment frequency (and pain!)
  • Lead time for change (from code commit to code deploy)
  • Mean Time To Restore (MTTR)
  • Change failure rate

Interestingly, the first two of these metrics are throughput (traditionally development-oriented) measures; the last two are stability (traditionally operations-oriented) measures.

The State of DevOps in 2019

As of the 2018 State of Devops report, the findings consistently show that:

  • Software delivery and availability unlock competitive advantages.
  • How you implement cloud infrastructure matters.
  • Use of open source software improves performance.
  • Outsourcing by function is rarely adopted by elite performers and hurts performance.
  • Key technical practices drive high performance. (i.e monitoring, automated testing, security integration)
  • Industry doesn’t matter when it comes to achieving high performance for software delivery.

The statistics show that the high performers exhibit 46 times more frequent code deployments than low performers. They have a 7 times lower change failure rate, over 2,500 times faster lead time from code commit to deployment, and are over 2,600 times faster to recover from incidents.

When an organisation can deploy quickly, recover rapidly, and suffer few outages, it has the ability to reach the market before competitors and respond to customer demand quickly. It will also provide more stable and secure service. This results, ultimately, in Goldratt’s “Goal”, making more money for the business.

Such a state is not reached by simply automating, using cloud technology, or recruiting a “DevOps Engineer” – it is the culmination of great team culture, continuous improvement, feedback loops, systems thinking, and a rigorous approach to using the right technology. DevOps is not a framework (like ITIL), an industry standard, a suite of tools, or a job title.

DevOps encompasses the culture, technologies, tools, skills and processes that enable organisations to go from idea to production as rapidly as possible, incurring low risk and cost, and providing high security and reliability at scale.

The definition of DevOps itself is continually evolving and improving, and while I may offer a definition as above, it will be out of date within days of writing, because, like the technology and services we build, it is continuously in flux, and being improved by the same people practising it.

Where does DevOps go next? I believe that the scope of DevOps needs to widen. As mentioned above, a large reason why DevOps is so successful is that it’s a ground-up movement, created and progressed by the actual people doing it (unlike ITIL, for example). However, this has meant that DevOps, naturally, focusses tightly on the technological functions of an organisation.

The next phase of DevOps includes practices and approaches such as “Platform as a Product” and also broadens the scope of DevOps to the wider organisation, evolving into “digital transformation” using Andrew Clay Shafer’s 5 Elements, Jabe Bloom’s Three Economies, and the broader, cross-sectoral concepts of resilience engineering in sociotechnical systems.

2023 Update: Safety Cultures and Platform Engineering

Over the past two to three years, DevOps has seen further maturity and adaptation to new norms, driven by unprecedented global circumstances and evolving technological trends. A particularly noteworthy shift has been the focus on building robust ‘Safety Cultures’. This approach emphasizes the creation of an environment where experimentation is encouraged, failures are seen as opportunities for learning, and psychological safety is paramount. Teams are given the latitude to innovate while knowing that missteps are not only tolerated but expected as part of the process of continuous improvement. This aspect has greatly enhanced the resilience of DevOps, fostering a more transparent, responsive, and adaptive culture.

Platform Engineering has also been a rising trend, presenting a shift in how organizations perceive their development infrastructure. Rather than treating platforms as a collection of tools and services, they are viewed as integrated products that evolve with the needs of the end-users, who are the developers. This perspective empowers developers, reduces overhead, and ultimately accelerates the delivery of value to the business.

The COVID-19 pandemic brought its own set of challenges and lessons. The necessity of remote working underlined the importance of strong communication channels, reliable cloud-based tooling, and the autonomy of distributed teams. It revealed the strength of DevOps practices in enabling organizations to maintain their pace of innovation even in the face of major disruptions. Companies that had already embraced DevOps were better positioned to navigate the transition to a remote work environment, demonstrating the value of adaptability inherent in the DevOps philosophy.

As we look to the future, the trajectory of DevOps and related methodologies appears more integrated and comprehensive. The focus will likely continue to expand beyond the technological realm, permeating deeper into business strategies and driving broader digital transformation initiatives. The trends of Safety Cultures and Platform Engineering are expected to solidify, with even greater emphasis on psychological safety, learning from failures, and treating internal platforms as products.

Furthermore, the remote working lessons from the pandemic will likely catalyze a shift towards more distributed, asynchronous ways of working. We may see a rise in ‘RemoteOps’, an evolution of DevOps practices adapted for a world where remote and flexible work arrangements become the norm. In this era, principles of effective remote communication, time-zone friendly practices, and trust-based management will become critical. In essence, the future of DevOps is about expanding its boundaries, integrating more closely with business goals, and continually evolving to meet the demands of our ever-changing world.

Measuring Psychological Safety in your Team

measuring psychological safety

We know psychological safety is crucial for high performance teams, and particularly so for technical delivery teams. Innovation is so critical for creating products that delight customers and serve critical business needs, and psychological safety is a fundamental enabler of innovation.

Below are ten questions that you can ask yourself or your teams to determine the level of psychological safety in your team. Rate agreement with the below statements on a scale of 1 – 5. 5 being “completely agree” and 1 being “completely disagree”.

When carrying this exercise out with your team, perform the survey anonymously – if it’s possible that your team are psychologically unsafe, they will be more likely to be honest if the survey is anonymous. If the team are very psychologically safe, then it won’t matter if the survey is anonymous or not.

It is also important to allow for qualitative, verbose feedback for each question as well, because that verbose feedback will facilitate and clarify some of the actions that you may need to take in order to improve these scores.

  1. On this team, I understand what is expected of me.
  2. We value outcomes more than outputs or inputs, and nobody needs to “look busy”.
  3. If I make a mistake on this team, it is never held against me.
  4. When something goes wrong, we work as a team to find the systemic cause.
  5. All members of this team feel able to bring up problems and tough issues.
  6. Members of this team never reject others for being different and nobody is left out.
  7. It is safe for me to take a risk on this team.
  8. It is easy for me to ask other members of this team for help.
  9. Nobody on this team would deliberately act in a way that undermines my efforts.
  10. Working with members of this team, my unique skills and talents are valued and utilised.

To explain the context behind each question:

1 – On this team, I understand what is expected of me.

It is essential that team members understand what is expected of them in terms of delivery (speed, quality, cost, and other factors) and behaviour (everything from dress code and punctuality to coding standards) to foster psychological safety. Ensure tasks are clear and well defined, behaviour expectations are explicit, and negative behaviours are dealt with.

2 – We value outcomes more than outputs or inputs, and nobody needs to “look busy”.

Outcomes (such as revenue generated or satisfied customers) matter more than outputs (emails sent, lines of code written, or meetings attended). If the team focus on what truly matters to the business, they are safe to make decisions that can improve outcomes, even if those decisions reduce output. The ideal is a team that possesses enough psychological safety to decide not to do something that could make them look good in the eyes of others, but doesn’t deliver outcomes for the business.

3 – If I make a mistake on this team, it is never held against me.

A psychologically safe team will never blame a member of the team for a genuine mistake if their intentions were good. Indeed, by enabling mistakes to be made without a fear of blame, you enable innovation and risk taking that can drive your organisation ahead of the competition. Utilise systems thinking and DevOps approaches to prevent mistakes before they happen or mitigate the impact of mistakes when they do.

4 – When something goes wrong, we work as a team to find the systemic cause.

Related to the previous point but important enough to warrant its own question, a system of discovering the root causes of mistakes and failures means that not only do team members feel able to take risks without being blamed, but every single “failure” is an opportunity for learning and improvement. By building psychological safety through these retrospective exercises, everyone on the team gets to learn from mistakes, meaning mistakes are a gift, not a threat.

5 – All members of this team feel able to bring up problems and tough issues.

In a psychologically safe team, all members of the team are able to bring up problems and tough issues, ranging from personal struggles to concerns about other (even senior) members of the team. This psychological safety is crucial for allowing both vulnerability to show when you’re struggling and need help, and courage to raise difficult topics.

6 – Members of this team never reject others for being different and nobody is left out.

Evidence shows that diversity in a team results in higher quality products and happier team members, but diversity in itself is not enough: it is crucial that team members are all included in decision making and delivering results. To facilitate psychological safety (and high performance) every member of the team needs to be invested in the decisions made and the outcomes generated. This is particularly crucial for remote and distributed teams, where it is more difficult to see if a team member is becoming disengaged.

7 – It is safe for me to take a risk on this team.

Mistakes happen unintentionally, but risks are about taking actions that might not work, or may have unintended consequences. Psychological safety provides the framework for positive risk-taking, enabling innovation and ultimately, competitive advantage.

8 – It is easy for me to ask other members of this team for help.

In psychologically unsafe teams, team members try to hide their perceived weaknesses or vulnerabilities, which prevents them from asking for help. In a psychologically safe team, members prioritise the team goals over individual goals. Helping others helps achieve the team goal, and because team members feel safe to ask for that help, psychologically safe teams achieve more of their goals than unsafe teams.

9 – Nobody on this team would deliberately act in a way that undermines my efforts. 

In an unsafe team, members compete with each other to achieve their individual goals, and may even undermine other team members if it could benefit them or it is perceived that doing so may elevate their “rank” within the team or organisation. In a psychologically safe team, that counter-productive competition doesn’t exist, and the success of the team is more important looking good in the eyes of others.

10 – Working with members of this team, my unique skills and talents are valued and utilised.

We all bring our own unique experience, skills and knowledge to the teams that we’re in, but we also bring our own prejudices and biases. In a psychologically safe team where members are valued for being their true selves, biases are less likely to manifest. Indeed, team members may feel safe enough to identify, raise, and discuss their own biases or those of other team members. By doing so, we provide space for each individual to maximise their potential from utilising their own unique skills and talents.

Regularly Measuring Psychological Safety

By measuring the degree of psychological safety on your team, you can begin to build your own unique strategy for developing and maintaining it. For instance, this may involve running more regular retrospectives or by workshopping the team’s values and behaviours.

Measurement is only a tiny part of the process. Download a complete Psychological Safety Action Pack full of workshops, tools, resources, and posters to help you measure, build, and maintain Psychological Safety in your teams.

Remember to be patient: this is a journey, not a destination, and work on your own psychological safety too. You can’t effectively help others if you don’t look after yourself.

Take this survey for yourself.

Psychological Safety in Remote Teams

A sudden adoption of remote working.

In early 2020, due to the Covid 19 outbreak, many organisations around the world went through a sudden digital transformation and many teams became remote. With this near-instant operational pivot to distributed and remote teams, organisations and the people within them encountered new and difficult challenges such as poor internet connectivity, inadequate home offices, and trying to manage simultaneous family and work life.

One of the biggest challenges is the impact of being physically distant from our teammates on our psychological wellbeing. Distributed teams have fewer opportunities for spontaneous, casual conversation; team members have more difficulty picking up non-verbal cues in conversation, and people are more likely to feel alone, anxious, unsure of what to do, and may even experience self-doubt or imposter syndrome.

Fundamental requirements for high performing teams

Psychological safety is the number one requirement for high performing teams. Without it, a team will never achieve high performance and the members of that team will not be able to realise their full potential. Now that many of our teams are distributed and remote, psychological safety is even more difficult to build and maintain.

Here are ten things you can do, whether you’re a leader or a member of your team, to help foster and build psychological safety, and increase the performance and happiness of your team and yourself.

Ten key actions to improve psychological safety in remote teams.

1. Set the stage.

We’re all going through difficult times, whether it’s financial concerns, supporting vulnerable friends and relatives or just dealing with the mental load of what’s happening in the world. Be honest about this with your team. Be explicit about the challenges ahead, and show your vulnerability. Without you showing vulnerability, your team will be unlikely to, and it’s a key part of building psychological safety. Be positive and enthusiastic about facing these challenges. 

management and leadership

2. Make sure everyone knows what to do.

Knowing what to do, when to do it, and what good looks like is crucial for remote team members. It’s far more difficult to ask for advice or assistance when remote, and self-doubt will creep in quickly. So make sure team members know what is expected of them, and ensure that workloads and deliverables are realistic. 

3. Focus on outcomes, not outputs. 

Outcomes matter more than anything else. Whether your desired outcome is satisfied customers, revenue generated, uptime, or something else, focus on that, and ensure the team remain focussed on it. Resist the temptation to revert to more traditional, “lazy” styles of management by measuring outputs, lines of code written, story points completed or meetings attended. And certainly avoid falling back to input-driven management by logging hours worked – we already know that is a route to reduction of psychological safety and it’s the last thing a distributed team needs. 

By keeping the team focussed on what really matters to the business, psychological safety will be improved, because team members will know that their hard work makes a difference, and they can contribute to the success of the organisation.

outcomes vs outputs

4. Build a culture of appreciation.

When we’re all in the same place, appreciation and thanks are much easier to communicate and tend to be passive or automatic. With distributed teams, much more effort needs to be made to ensure team members feel valued and appreciated. This means being much more explicit with appreciation, and communicating it in multiple ways such as through video calls, emails, and instant messaging. It’s very easy to forget how often we thank each other when we’re co-located, and without that culture of appreciation, psychological safety will suffer.

5. Embrace routine and ritual.

The dramatic shift in ways of working has resulted in disruption to our routines – our start and finish times, any regular meetings, and lunch breaks have all been disrupted. Routines help us as humans feel more comfortable and psychologically safe when the world around us is changing and there is so much uncertainty elsewhere. 

Ritual also plays an important role in team cohesion, particularly so with distributed and remote teams. Every team will have its own rituals and ceremonies, from ringing a bell at a sprint kickoff, to having end-of-week drinks on a video call. Whatever the rituals are, keep them up in order to build psychological safety.

ringing a bell

6. Establish work boundaries.

Work has invaded our homes and our personal space and time. It’s very easy to allow work to spread out, particularly if strict boundaries are not set. Help your team set these boundaries, and enforce and model them. This may be ensuring that team members can turn off their phones after 6pm without worrying about missing important messages, or purchasing home office equipment so they don’t need to work from their kitchen table. 

To maintain psychological safety, team members need to be able to remove themselves from work and maintain their own personal, home and family space.

7. Use the many species of video call.

Video calls aren’t just for meetings. To bring back the feeling of cohesion and togetherness that is so important for psychological safety, try out different kinds of video call, such as “good morning” meetings to start the day, or by having an “always-on” watercooler style meeting where people can drop in and out as desired. Feeling more connected to team mates will build psychological safety and improve communication.

8. Be actively inclusive, or risk being passively exclusive.

In an office setting, it’s easy to see if someone is not engaged or is pulling away from the team. With a distributed team, this is far more difficult even on video calls. 

A critical stage of psychological safety is “contributor safety” – everyone needs to contribute if the team is to achieve high performance, and in distributed and remote teams, if you’re not being actively inclusive, you’re risking being passively exclusive. To build psychological safety, invite participation, ask questions, and always ensure that everyone has spoken at least once before ending a meeting.

one person withdrawn from the group

9. Adopt Hanlon’s razor.

First published in German in 1774, Johann Wolfgang von Goethe wrote in The Sorrows of Young Werther: “Misunderstandings and lethargy perhaps produce more wrong in the world than deceit and malice do. At least the latter two are certainly rarer.” A sentiment later attributed to Robert J. Hanlon, hence “Hanlon’s razor”.

That is to say, it is important to assume the best intentions. If an email or message comes across as rude, blunt, or offensive, assume it was a miscommunication or misunderstanding. If in doubt, ask for clarification, ideally via video or voice.

To avoid others falling into the same trap, embrace emojis and gifs in your communications, even if they’re not your usual style. Emojis and gifs can help build and maintain psychological safety by ensuring that your communication is received in the most positive way possible.

smiley emoji helps to reassure intention

10. Put your own oxygen mask on first.

If you’re struggling with your own psychological safety, you will not be as effective in helping others with theirs. Find a mentor to advise and help you, eat healthily (but remember to treat yourself), exercise, meditate, and take time away from work; essentially, do whatever you know helps you maintain a happy and healthy approach and pace of work. As leaders of teams, many of us get so focused on caring for our team members that we minimise or neglect our own needs, but if you don’t look after yourself, you can’t look after others.

Take your time.

Finally, be patient. These are difficult times, and it’s to be expected that we will all experience challenges that impact our psychological safety and that of our team members. Utilising the ten behaviours above will help you and your team maintain psychological safety and improve not just team performance, but happiness too. Remember, happy teams aren’t happy because they’re high performing: they’re high performing because they’re happy.

Check out information about how to measure psychological safety in your teams here.

Download a complete Psychological Safety Action Pack full of workshops, tools, resources, and posters to help you measure, build, and maintain Psychological Safety in your teams.

For more information about building psychologically safe teams, read more about DevOps and psychological safety, read about high performing teams and psychological safety, or get in touch if you’d like me to speak or work with you.