Tom Geraghty - DevOps, Leadership and Psychological Safety

November 23, 2021February 3, 2023

Platform as a Product

There’s a lot of information (and misinformation) on the concept of “Platform as a Product” in respect to current thinking in DevOps and organisational dynamics, and this is where I’m gathering my thoughts on that.

The core premise of “Platform as a Product” is to make explicit the need for a platform (low variability) to exist as a separate system from the customer-facing products (valuable variation**), and requires a long-lived platform team, practices, and budget to support it. Just to dive into this briefly: a platform is designed similarly to the manufacturing production line below – we want a platform to provide consistency and reliability (low variability). Indeed, it’s the consistency and reliability provided by a platform that enables a customer-facing product team to deliver products demonstrating high variation – which means we can rapidly deliver and test new features and changes.

The Platform as a Product internal operating model is not a perfect approach, and it certainly isn’t the “only” way to utilise the power of cloud platforms for development teams, but this approach has been shown to work for teams and organisations at various stages of maturity.

In my mind, a product has a few key characteristics:

It has users / customers, who are the key stakeholders in how the product should deliver value.
It’s long-lived.
It evolves over time in response to the needs and desires of the customers.
It’s “owned” by a person or team.
It only does what it needs to do. It as important to remove unused or under-used features as much as it is to evolve new features and functions.
It has a name.

Evan Botcher describes a platform as follows:

A digital platform is a foundation of self-service APIs, tools, services, knowledge and support which are arranged as a compelling internal product. Autonomous delivery teams can make use of the platform to deliver product features at a higher pace, with reduced co-ordination.

From “What I Talk About When I Talk About Platforms” by Martin Fowler

In this article, I’ll try to outline my thinking in the rationale and benefits of adopting a Platform as a Product approach to technology and platform teams themselves. There are three core drivers behind the Platform as a Product approach, which are addressed in further detail below.

A platform should:

Reduce Developer Cognitive Load – software developers can be overloaded with system complexity, tooling, documentation, and organisational noise. The PaaP approach intends to reduce that cognitive load so that developers can focus on solving the problem and providing business value, quickly.

Reduce Operational Burden – this includes everything from reducing utilisation of people, reducing friction and handoffs, improving observability, to capacity management, and documentation. Basically – making everyone’s jobs easier alongside maximising the technology ROI for the business.

Optimise for Fast Flow – most, maybe all, organisations want to reduce the time it takes for an idea to begin returning value. This is true of commercial businesses, public sector, and other organisations such as charities. This involves optimising the technology for flow (automation, CI/CD tooling etc) as well as people, processes and practices. This is why we can’t separate the People from the Platform: the PaaP approach is not solely technological.

Thanks to Mike Hepburn for summarising these.

Types of work:

When we’re building software, we’re not manufacturing goods: i.e. we’re not producing the same thing over and over and trying to minimise variation. Neither are we building a single large deliverable, like a bridge or a custom car. We’re building a long-lived evolving product.

And that’s important to remember: products evolve over time, and there are elements of the product that we want to stay the same and benefit from stability, reliability and security as a result. But there are elements of the product that we want to change frequently, such as features, UX, and integrations. A platform as a product approach allows us to differentiate between the different needs and requirements of different parts of the system, and decouple those elements that need to change fast and often, from those that need to provide stability and reliability, and change less often. Different parts of the system demonstrate different levels of variability, and thats where platforms come in.

Cost of Delay

Part of the reason we want to be able to move quickly and introduce new features and changes is to reduce our Cost Of Delay – see the charts below:

The chart above shows the cost of delay for a simple product: the sooner you get the product to market, the sooner, you begin to realise value.

cost of delay - opportunity The chart here shows what happens if a competitor releases before you do. Not only do you miss out on the revenue due to the delay, but your competitor has grabbed the customers that you would have acquired if you’d released sooner. And because you can’t now get those customers, the loss is far greater: this is a case of missing the market window. So, let’s use platform as a product approach to reduce our cost of delay.

Software Delivery Performance

Nicole Forsgren, in her book, Accelerate, describes four key DevOps delivery metrics :

Lead time to change
Deployment frequency
Mean time to restore (MTTR)
Change failure rate

These metrics all describe capabilities that ultimately reduce your cost of delay, as well as customer satisfaction. So, let’s try to improve these metrics by implementing platform as a product approaches.

(Note: we shouldn’t be trying to reduce change failure rate or MTTR to zero – for all these metrics, we should be optimising them rather than striving to get them to zero or maximise deployment frequency.)

Handoffs and Flow

Handoffs reduce flow. Just ask these baton runners. Even with many hours of practice, a handoff can still result in failure and an impediment to speed. So, let’s reduce handoffs between team members by using a platform as a product so that developers can use and deploy systems on-demand.

Not only do handoffs reduce flow, but if someone is busy, they can’t even accept the work you’re trying to give them. This is the only chart in the Phoenix project and describes the exponential impact on wait time as utilisation increases above ~80%. So, let’s reduce utilisation to acceptable levels at the same time as reducing handoffs (or eliminating them completely) by adopting the platform as a product approach and saving developers time that might be spent on maintenance and management of the underlying infrastructure.

Team Size

The two diagrams above show us that team sizes shouldn’t exceed certain numbers. Communication complexity shows us that above 8-9 team members (such as SLAM teams), communication becomes a blocker via exponentially increasing complexity. Dunbar’s theories also suggest (possibly as a result of communication complexity) that teams should remain below 15 people. So, let’s keep team sizes appropriately small by having a separate platform product team and development product teams.

Developer Cognitive Load

From Fraser, K. et al. (2018) “Cognitive Load Theory for debriefing simulations: implications for faculty development”, Advances in Simulation, 3(1). doi: 10.1186/s41077-018-0086-1.

Cognitive load is critical in software development.

–Intrinsic cognition is all the stuff you already know, such as how to make a decent cup of tea. Where the teabags are, how to operate the kettle, how much milk to put in, etc.

–Extraneous cognition is all the external stuff that you need to find out or understand, such as where someone left the teabags because you can’t find them. It also includes “noise” such as distractions, how to operate unfamiliar equipment, or how to comply with regulatory requirements.

–Germane cognition is active learning and problem solving. That’s the stuff of real value – such as comparing Tetley tea to Yorkshire Tea in a taste test to find out which is better.* It’s also the process by which learning is transferred from short-term to long-term memory. It’s only through Germane cognition that we actually achieve anything or provide real value.

So, let’s minimise our extraneous cognitive load for developers and reduce the need for intrinsic cognitive load, so that the maximum effort can be put into the germane problem solving. The platform product does this by improving the developer experience (DevEx) and making it easier for developers to do what they need to do without referring to documentation or asking someone how to do something.

Thinnest Viable Platform

The platform itself needs only be as big as necessary to reduce the cognitive load of developers: so it may be sufficient for the platform to be a simple one-page repo describing how to deploy to AWS, or the platform could be a fully contained, multi-region, multi-cloud, self healing pipeline and platform. Matthew Skelton refers to this approach as the “Thinnest Viable Platform”.

Evidence for Platform As A Product success

The State of DevOps reports from 2020 and 2021 provide strong evidence that adopting a platform as a product approach, using internal platform teams, improves software delivery performance, via many of the mechanisms described above.

Platform Teams

Matthew Skelton and Manuel Pais, in their book Team Topologies, describe four team types that enable fast flow in software development (and in other domains too). We won’t go into all four types and the three interaction modes here – there’s a ton of great information on the Team Topologies website. The platform team is essentially just like any other stream-aligned team, except their product is the platform itself, and their customers are the developers who use it.

Personal note: I’m uncomfortable with the premise of teams in an organisation categorising other teams as customers or suppliers because it can subordinate one team to another. I worry that language could lead us back to the bad old days of Ops teams being subordinate to Dev teams and a reversion back to silos. Instead, I suggest that you can use this concept to determine the boundaries of teams and create “Team APIs” and social contracts to surface and make explicit how teams communicate with each other.

Platform As A Product

I would strongly advise also looking at internalising service design capabilities and expertise to help teams design and build the platform. The platform as a product approach is fundamentally a practice to enable improved efficiency, improved product quality and reliability and faster speed to market, via reducing cognitive load for developers, faster flow of work, reduced handoffs, and enabling developers to focus on delivering value.

Criticisms of the Platform as a Product Approach

This approach isn’t a silver bullet. As with any framework or defined practice, it should be considered as a stage of a journey, and it may well be the case that very mature and highly capable technology delivery teams don’t require this approach, and can adopt a polycentric, shared commons approach that doesn’t require the platform to have a dedicated team, but distributes ownership across multiple teams. Check out Jabe Bloom on the Boundaryless podcast if you’re interested further – Platforming inside and between organizations: differentiation, scale, and scope.

However, I believe that most teams are not yet at that stage of maturity or capability, and it may take a long time to get there, so I feel that the Platform as a Product approach is a valid and effective path to high performance and effective delivery for most organisations.

If you’d like to find out more about this approach, join a workshop to enable your teams to adopt it, or find out more about the ways to evolve your organisational dynamics and team structures, get in touch with me, or hit up Matthew Skelton and the folks at Conflux who can help to power up your people, processes and technology.

Coming soon – the platform as a product playbook.

*Obviously we don’t actually need to run that experiment, Yorkshire Tea is clearly better.

November 5, 2021August 11, 2023

Hybridisation and the heterosis of hybrid work.

Hybrid work is upon us.

Hybridisation in biological systems often creates a phenomena known as heterosis (also known as “hybrid vigour”): where the combining of two distinct varieties or genotypes results in a far stronger, more vigorous offspring, even though the resulting hybrid is usually sterile. Many commercial crop varieties are based on this principle, and the mule is a good example too, as the offspring of a male donkey and a female horse.

I’ve also been thinking about “hybrid” ways of working, and whether this kind of hybridisation also results in stronger and “better” outcomes. It’s much, much harder to create successful hybrid working systems and environments, but if we get it right, it allows us to exploit the benefits of both: the time saving efficiencies and comfort of remote and home working, combined with the power of high-bandwidth, in-person collaboration. But done badly, it results in the exclusion of individuals dialling in remotely to an in-person meeting, unpredictable travel patterns and lack of habit and ritual formation that’s so important to a high performing team.

If we’re going to make hybrid work, work for us, we need to be very intentional in designing the systems, processes, environments and practices that we use. And we must adopt an experimental approach, constantly evaluating and re-evaluating our decisions in order to keep, improve, or discard them in response to feedback.

Hybridisation of work can make us stronger, but only if we’re intentional and humble in our approach. If we are not, we risk degrading our outcomes as well as burning out our people.

Maybe the sterility of the outcome is where the analogy ends however!

October 7, 2021October 22, 2021

Digital Transformation and Organisational Dysfunctions

A gap between strategy and delivery – we know what we want to do, and we have the people and tools to do it, but we can’t seem to do it. We end up building something different to what we intended in the strategy. This may be a sign of weak strategy, or it may be a product ownership problem – translating business strategy into the products and services to be delivered.
A gap between desire for pace of delivery and ability to deliver. We want to go at 100mph, but we can only go at 30mph. The pace of delivery may be constrained by capability, tooling, process, constraints, or simply capacity. It’s usually not actually a capacity problem, however. In technological domains, this is the realm of DevOps transformations and the practices that enable value to be delivered at high velocity whilst maintaining reliability and quality.
A lack of organisational observability that results in poor understanding of value flows across the organisation, poor awareness of sociotechnical aspects of the system as a whole, resulting in problems that are known by teams taking leadership by surprise, if they ever become aware of them. A lack of systems thinking, combined with poor psychological safety across the organisation, results in executives only being told what they want to hear, or information becoming diluted as it flows “up”.
Short termism – poor incentive structures (indeed, most incentives) or cultures mean that people are focused on immediate short term wins rather than long term value and outcomes. This is also manifested by an adherence to project methodologies where the delivery of value has a start and, specifically, an end date, instead of a long-lived product approach that provides people with greater ownership of outcomes, longer lived teams, lower technical and operational debt, and higher quality products and services.
Quality issues. We can build the right things, but we can’t do it well. Conflicts of interest or capability issues mean that products and services are delivered, but they suffer from reliability, consistency or architectural problems. Technical and operational debt is high, and teams feel like they are always firefighting and dealing with unplanned work. An approach of late inspection rather than building quality in to the process is often part of the cause of this dysfunction.
Poor organisational ability to learn. Systems, cultures and processes hinder people’s (and groups of people, such as teams or business units) ability to learn from failures and successes. The same mistakes are made repeatedly, and when successes do get made, the valuable learning from them is not institutionalised. Psychological safety, along with rituals such as retrospectives, may be lacking in this organisation.
An excessive inward focus. Focussing too much on “what we do” rather than looking out at the world for challenges, opportunities, and a changing landscape means that opportunities are wasted and challenges can present existential threats to the organisation through a lack of capability to become aware of them, let alone adapt to them. A strong organisational cultural identity, whilst a powerful and valuable aspect of an organisation, can result in this dysfunction.
An excessive outward focus. A focus only on the external means that market and environmental opportunities and threats are detected, but threats to performance or opportunities for improvement arising from inside the organisation are not detected, mitigated or exploited.
A bimodal approach to value where the products and services delivered to customers far exceed the quality and features of those delivered to people within the organisation who are expected to use those services to do their job. We wouldn’t provide surgeons with blunt scalpels and expect a great result for the “customer”, but many organisations provide poor quality services and tools to employees whilst expecting high quality outcomes.
A culture of fear over a culture of experimentation. An organisation that enforces behaviour or strives towards goals based on the consequences of failure or divergence from the norms, will move much slower and gradually grind to a halt. In these organisations, the safest thing to do is to comply with rules and take as few risks as possible, rather than suggest ideas, try (and risk failure), or admit mistakes.

September 22, 2021February 16, 2022

The Accelerate State of DevOps Report 2021 – A Summary

The 2 state of DevOps reports each year aggregate the current state of technology organisations globally in respect to our collective transformation towards delivering value faster and more reliably. Or as Jonathan Smart puts it, “Sooner. Safer, Happier”.

The DevOps shift has been in progress for over a decade now, and whilst DevOps was always really about culture, the most recent reports are now emphasising the importance of culture, progressive leadership, inclusion, and diversity more than ever before.

Last year, in 2020, the core findings of the State of DevOps Report focussed on:

The technology industry in general still had a long way to go and there remained significant areas for improvement across all sectors.
Internal platforms and platform teams are a key enabler of performance, and more organisations were starting to adopt this approach.
Adopting a long-term product approach over short-term project-oriented improves performance and facilitates improved adoption of DevOps cultures and practices.
Lean, automated, and people-oriented change management processes improve velocity and performance over traditional gated approaches.

This year (2021), there are a number of key findings in the Accelerate State of DevOps Report, building on previous iterations:

The “highest performers” continue to improve the velocity of delivery, through practices that enable teams to continually identify improvements to tooling, technology and process.
Adoption of SRE practices improves wider organisational performance. Teams that prioritise both delivery and operational excellence report the highest organisational performance. Reliability is as important, if not more so, than short lead time for changes.
Adoption of cloud technology accelerates software delivery and organisational performance, and enables the five capabilities of cloud native technology. Multi-cloud adoption is increasing, so that teams can utilise the strengths of each provider and improve resilience against risk of a single provider failure.
Secure Software Supply Chains that integrate security practices into pipelines and processes enable teams to deliver secure software quickly, safely and reliably.
Documentation is important. Teams that create and maintain high quality documentation are more able to implement technical practices, make changes, and recover from incidents.
Inclusive and generative team cultures improve resilience and performance. Teams with psychologically safe and inclusive cultures suffered less from burnout during the Covid-19 pandemic.

View the entire 2021 Accelerate State of DevOps report here.

View the 2021 Puppet State of DevOps Report summary here.

And read here a summary of all the State of DevOps reports since 2013!

September 21, 2021February 14, 2022

Health Promotion and HIV/AIDS pandemics the UK and South Africa

(Originally submitted as coursework towards my Masters in Global Public Health at the University of Manchester)

It is quite clear that the UK and South Africa are in very different situations with respect to the HIV/AIDS epidemic. This is due in large part to behavioural changes in injecting drug users (IDUs) (Stimson, 1995) and adoption of safe sex practices including increased condom use amongst gay men, in the UK (Fitzpatrick et al, 2013).

As this chart shows, the disparity between the two countries is huge. In 2017, there were 7,149 new cases of HIV in the UK, but 276,496 in South Africa.

Chart 1. New cases of HIV in the UK and South Africa, 1990 to 2017. Roser and Ritchie, 2019.

Whilst it is clear that new cases in South Africa are falling, prevalence of HIV/AIDS continues to increase as shown in chart 2:

Chart 2. Prevalence, new cases and deaths from HIV/AIDS in South Africa, 1990 to 2017. Roser and Ritchie, 2019.

The decreasing number of new infections in South Africa is due in large part to increased condom use and anti-retroviral treatment (ART) ((Vandormael et al, 2019), alongside higher engagement by women in the healthcare system – women who are more likely than men to request tests for HIV, request and access ART and therefore become non-infectious for HIV (Birdthistle et al, 2019).

However, ART is expensive. Behaviour change is more difficult, but has a much greater ROI (Return On Investment). Due to the UK’s early approach of addressing behaviour change in high-risk groups, prevalence of HIV/AIDS has remained low, which, combined with increased safe sex and drug use practices, helps to keep incidence rates low (Stimson, 1995). Thus, the UK does not need to rely on large-scale ART interventions like South Africa, which is reflected in the costs each country must bear as shown in chart 3.

Chart 3. HIV expenditure on prevention and treatment, 2006 to 2014. Roser and Ritchie, 2019.

In 2009, South Africa spent $2.33billion on HIV prevention and treatment, whilst the UK spent $80.3million. It is unfortunately true that whilst prevalence is so high, ART is necessary to prevent significant increases in incidence rates, and increased cost-effectiveness may indeed be achieved by oral pre-exposure prophylaxis (PrEP) – the provision of ART treatments to individuals in high-risk contexts (Alistar et al, 2014).

Whilst in South Africa, increased ART provision (and spending), including to those high-risk groups not (yet) infected with HIV, is necessary for the promotion of health, in the UK the story is somewhat different. The low prevalence of the disease means that safe sex practices – the continued emphasis on condom use – and expansion of access to HIV testing, alongside continuation of ART for people living with HIV/AIDS and the use of PrEP for those with an HIV-positive partner, could result in the near-elimination of HIV transmission (Brown et al, 2018).

Word Count: 461

References:

Alistar, S.S., Grant, P.M. and Bendavid, E., 2014. Comparative effectiveness and cost-effectiveness of antiretroviral therapy and pre-exposure prophylaxis for HIV prevention in South Africa. BMC medicine, 12(1), pp.1-11.

Birdthistle, I., Tanton, C., Tomita, A., de Graaf, K., Schaffnit, S.B., Tanser, F. and Slaymaker, E., 2019. Recent levels and trends in HIV incidence rates among adolescent girls and young women in ten high-prevalence African countries: a systematic review and meta-analysis. The Lancet Global Health, 7(11), pp.e1521-e1540.

Brown, A.E., Nash, S., Connor, N., Kirwan, P.D., Ogaz, D., Croxford, S., De Angelis, D. and Delpech, V.C., 2018. Towards elimination of HIV transmission, AIDS and HIV‐related deaths in the UK. HIV medicine, 19(8), pp.505-512.

Fitzpatrick, R., McLean, J., Boulton, M., Hart, G. and Dawson, J., 2013. Variation in sexual behaviour in gay men. In AIDS: individual, cultural and policy dimensions (pp. 129-140). Routledge.

Roser, M. and Ritchie, H., 2019. HIV/AIDS–Our World in Data. Available at: https://ourworldindata.org/hiv-aids (Accessed: 15 June 2021).

Stimson, G.V., 1995. AIDS and injecting drug use in the United Kingdom, 1987–1993: the policy response and the prevention of the epidemic. Social science & medicine, 41(5), pp.699-716.

Vandormael, A., Akullian, A., Siedner, M., de Oliveira, T., Bärnighausen, T. and Tanser, F., 2019. Declines in HIV incidence among men and women in a South African population-based cohort. Nature communications, 10(1), pp.1-10.