I don’t consider myself particularly stupid, and I really enjoy DevOps, but every year I try and read the Accelerate State of DevOps Report (SoDR) and every year it just makes me feel dumb. The language used removes me from the gold within. Take the following sentence for example: “Contextualizing the research is possible when practitioners have conversations about how work is completed today”. I don’t fully understand that sentance on first read through. I find it muddy, unclear. The language, combined with the design of the PDF, leaves a plate of dark tinted glass over the entire thing.
Other technical documents - AWS Whitepapers for example - don’t do this to me, or not so much anyway. Perhaps because the focus of these whitepapers is singular, or at least much less abstract. Either way, I think sentences like the example above are just a bit daft (“We can understand the research better when people talk about how they currently work.” perhaps?)
Anyway, I…
- Broke the PDF up into single pages
- Converted each page from PDF to a .txt file
- Ran it through the Anthropic API telling the LLM to reformat the text into pretty Markdown
- Generated a simple-language summary at the bottom of each page
- Combined all the pages into a single document, below.
The following is the output of that process. It brought me no closer to the goal - it’s actually a little awful, but the journey was fun, and that’s what matters right? (The goat diagrams are the best part).
Accelerate State of DevOps Report 2023
Introduction
The Accelerate State of DevOps Report 2023 is an annual report that provides insights into the current state of DevOps practices and their impact on organizational performance. The report is based on a survey of professionals from various industries and aims to identify the key factors that contribute to successful DevOps implementations.
Key Findings
- High-performing organizations continue to outperform their peers in terms of deployment frequency, lead time for changes, time to restore service, and change failure rate.
- Automation and continuous delivery practices are strongly associated with improved performance outcomes.
- Psychological safety and a culture of learning and experimentation are critical for fostering innovation and improving performance.
- The use of cloud technologies and modern architectures, such as microservices and containers, is increasing among high-performing organizations.
- The adoption of site reliability engineering (SRE) practices is growing, with a focus on balancing reliability and innovation.
Conclusion
The Accelerate State of DevOps Report 2023 highlights the importance of embracing DevOps practices, fostering a culture of collaboration and experimentation, and leveraging modern technologies to drive organizational performance. By adopting these practices and principles, organizations can improve their ability to deliver value to their customers and maintain a competitive edge in an increasingly digital world. Here is the table of contents converted to markdown format:
Executive Summary
Prelude
This research aims to provide leaders and practitioners with insights on where they can make an impact. The research explored three key outcomes and the capabilities that contribute to achieving them:
Outcome | Description |
---|---|
Organizational performance | The organization should produce not only revenue, but value for customers and the extended community. |
Team performance | The ability for an application or service team to create value, innovate, and collaborate. |
Employee well-being | The strategies an organization or team adopts should benefit the employees—reduce burnout, foster a satisfying job experience, and increase productivity. |
The research also explored performance measures that are often talked about as ends-in-themselves:
Performance Measure | Description |
---|---|
Software delivery performance | Teams can safely, quickly, and efficiently change their technology systems. |
Operational performance | The service provides a reliable experience for its users. |
For nearly a decade, the DORA research program has been investigating the capabilities and measures of high-performing technology-driven organizations. They have heard from more than 36,000 professionals from organizations of every size and across many different industries.
DORA tries to understand the relationship between ways of working (capabilities) and outcomes, which are meaningful accomplishments that are relevant across an organization and to the people in it. This research uses rigorous statistical evaluation and is platform-agnostic.
Summary
The DORA research program has been studying high-performing technology organizations for almost 10 years. They look at the relationship between how organizations work (their capabilities) and the outcomes they achieve, like organizational performance, team performance, and employee well-being. The research also looks at measures like software delivery performance and operational performance. The goal is to give leaders insights into where they can make changes to improve their organization’s performance and the well-being of their employees. Summary:
The key findings emphasize the importance of establishing a healthy culture, focusing on users, and balancing technical capabilities to drive organizational performance and employee success. Some key points:
- Generative cultures lead to 30% higher organizational performance.
- A user focus results in 40% higher organizational performance.
- Faster code reviews are linked to 50% higher software delivery performance.
- High-quality documentation greatly amplifies the impact of technical capabilities on organizational performance.
- Cloud computing and infrastructure flexibility lead to 30% higher organizational performance.
- Balancing delivery speed, operational performance, and user focus yields the best results and improves employee well-being.
- Underrepresented groups and women tend to experience higher levels of burnout, likely due to taking on more repetitive work. Ensuring a fair distribution of work is crucial. Applying DORA Insights in Your Context
To get the most out of the DORA research, it’s important to consider it in the context of your own team and users. For example, while teams with faster code reviews have 50% higher software delivery performance, your performance may not improve if code reviews are already fast but speed is constrained elsewhere in the system. Contextualizing the research requires conversations about how work is completed today, which can lead to improved empathy, collaboration, and understanding of each participant’s motivations.
Improvement work is never done. The process involves finding a bottleneck in your system, addressing it, and repeating the process. The most important comparisons come from looking at the same application over time, rather than comparing to other applications, organizations, or industries.
Metrics and Measurements
Metrics and dashboards help teams monitor their progress and correct course. While practitioners and leaders strive for organizational performance, team performance, and well-being, measurement itself is not the goal, just as delivering software is not the goal.
Fixating on performance metrics can lead to ineffective behaviors. Instead, investing in capabilities and learning is a better way to enable success. Teams that learn the most improve the most.
You Cannot Improve Alone
We can learn from each other’s experiences. The DORA Community site (https://dora.community) is an excellent forum for sharing and learning about improvement initiatives.
Summary: This page discusses how to apply insights from the DORA (DevOps Research and Assessment) research in your own context. It emphasizes the importance of contextualizing the research through conversations about current work processes, focusing on continuous improvement by addressing bottlenecks, and avoiding fixation on performance metrics. Instead, investing in capabilities and learning is encouraged. The page also highlights the value of learning from others’ experiences through the DORA Community site. Concepts and Measures
This section describes the concepts DORA tries to measure, which form the foundation of both the report and the models used. It is important for the authors to be clear and consistent about these concepts.
Key points:
- Multiple indicators are often used to capture multifaceted concepts
- Exploratory and confirmatory factor analysis is used to evaluate the success of capturing these concepts (more details in the Methodology section)
- Scores are scaled from 0 to 10, with 0 representing the complete lack of a concept and 10 representing its maximum presence
- This standardizes how the concepts are discussed and allows for comparing data across years
Each concept is presented with:
- An icon for easier reference
- The average score (mean) for the concept in the sample
- The boundaries of the interquartile range (25th and 75th percentiles) to show the spread of responses
- The median value, which if dramatically different from the mean, may indicate skewed data
- A description of the concept and how it is measured
Summary: This page introduces the key concepts DORA measures and explains how they are presented in the report. Scores are standardized on a 0-10 scale for consistency and comparison across years. Each concept is accompanied by an icon, average score, interquartile range, median, and description for clarity and ease of reference. Key Outcomes
The key outcomes are the goals that people, teams, or organizations strive to reach or avoid. These measures are important for self-evaluation and assessing performance at various levels.
The key outcomes discussed are:
Outcome | Mean | Median | IQR | Description |
---|---|---|---|---|
Organizational performance | 6.3 | 6.3 | 5-8 | High performing organizations have more customers, higher profits, and more relative market share for their primary product or service. |
Team performance | 7.6 | 8 | 6.6-9 | High performing teams adapt to change, rely on each other, work efficiently, innovate, and collaborate. |
Software delivery performance | 6.3 | 6.4 | 5.1-7.8 | Measured by deployment frequency, lead time for changes, change failure rate, and failed deployment recovery time. |
Operational performance | 6.2 | 6.3 | 5-7.5 | The extent to which a service meets user expectations, including availability and performance. |
Job satisfaction | 6.08 | 7.1 | 5.7-7.1 | A single-item question that asks respondents to rate how they feel about their job as a whole. |
Burnout | 4.1 | 4 | 2-6 | The psychological and physical toll of work, and how one appraises the value and meaning of their work. Burnout causes cynicism. |
Productivity | 7.5 | 7.9 | 6.7-8.8 | A productive person does work that aligns with their skills, creates value, and lets them work efficiently. |
Reliability targets | 7 | 7.5 | 5-7.5 | The extent to which a service meets its stated goals for measures like availability, performance, and correctness. |
Well-being is a composite of burnout, productivity, and job satisfaction.
Summary:
This page discusses the key outcomes that people, teams, and organizations strive to achieve or avoid. These outcomes include organizational performance, team performance, software delivery performance, operational performance, job satisfaction, burnout, productivity, and reliability targets. The page provides a brief description of each outcome and presents their mean, median, and interquartile range (IQR) values. Additionally, it mentions that well-being is a combination of burnout, productivity, and job satisfaction. Summary:
This page discusses various reliability practices, processes, and technical capabilities that teams use to improve the operational performance of services. These include:
- Artificial intelligence contribution: The role of AI in contributing to technical tasks (mean: 3.3, median: 2.4)
- Documentation: The quality of written content used in daily work (mean: 5.8, median: 6.25)
- Code review speed: The time it takes from pull request to code change review (mean: 6.5, median: 6)
- Trunk-based development: Making small, frequent changes regularly merged into the main code branch (mean: 5.6, median: 5.6)
- Continuous integration: Automatically building and testing software changes (mean: 6.9, median: 7.8)
- Loosely coupled architecture: Software that can be written, tested, and deployed independently (mean: 6.4, median: 6.7)
- Continuous delivery: Getting changes into production safely, quickly, and sustainably (mean: 7.0, median: 7.3)
- Flexible infrastructure: Scalable infrastructure that is elastic, accessible, and measured (mean: 6.6, median: 7.3)
The page provides the mean, median, and interquartile range (IQR) for each practice, giving a sense of how widely adopted and important each one is considered to be.
Culture Aspects
Defining culture is challenging, but it can be described as the prevailing norms (such as flexibility), the prevalent orientation (such as user-centricity), and the ambience (such as organizational stability) of the workplace.
Key Culture Aspects
Aspect | Mean | Median | IQR | Description |
---|---|---|---|---|
User-centrism | 7.8 | 7.8 | 5.6-8.3 | Understanding and incorporating users’ needs and goals to make products and services better. |
Knowledge sharing | 6.4 | 6.7 | 5.0-8.3 | How ideas and information spread across an organization. Team members answer questions once and make the information available to others. People don’t have to wait for answers. |
Westrum organizational culture | 7.3 | 7.8 | 6.1-8.6 | How an organization tends to respond to problems and opportunities. There are three types of culture: generative, bureaucratic, and pathological. |
Job security | 5.9 | 6.7 | 3.3-8.3 | A single-item measure that asks people how often they worry about their job security. Higher scores equal less worry. |
Work distribution | 5.8 | 5.8 | 3.8-7.9 | Formal processes to help employees distribute tasks equitably within a team. |
Organizational stability | 7.2 | 8.3 | 6.7-8.3 | A single-item measure that asks how stable or unstable the work environment is for employees. |
Flexibility | 7.7 | 8.3 | 6.6-8.9 | How, where, and when a person works on tasks. |
Summary
This page discusses various aspects of organizational culture and their importance in the workplace. Key aspects include user-centricity, knowledge sharing, Westrum organizational culture, job security, work distribution, organizational stability, and flexibility. The table provides an overview of each aspect, including its mean, median, and interquartile range (IQR) values, along with a brief description. These aspects collectively contribute to the overall culture of an organization and can significantly impact employee satisfaction, productivity, and the success of the organization as a whole. Here is the converted text in markdown format with a summary at the end:
Chapter 1 Takeaways
The first step in improving performance is to set a baseline for an application’s current software delivery performance, operational performance, and user-centricity. These measures help teams evaluate how they’re doing and provide a good signal for how things are changing over time.
These measures, though, are not the means by which a team will improve. With this baseline, it is important to assess a team’s strength across a wide range of people, processes, and technical capabilities to identify which might be holding back progress. Next, teams need the time and space to align, experiment, and reassess. Repeating this process will help teams adopt a mindset and practice of continuous improvement.
Watch out for these and other pitfalls when using these comparisons:
- Unlike comparisons. Comparing applications based solely on these clusters is not likely to be useful. Doing so discards the context of each application in ways that might be detrimental to the goal of improving.
- Setting metrics as a goal. Ignoring Goodhart’s law and making broad statements like “every application must demonstrate ’elite’ performance by year’s end” increases the likelihood that teams will try to game the metrics.
- One metric to rule them all. Attempting to measure complex systems with the “one metric that matters.” Using a combination of metrics to drive deeper understanding.
- Narrowly scoped metrics. People tend to measure what is easiest to measure, not what is most meaningful.
- Using industry as a shield against improving. For example, some teams in highly regulated industries might use regulations as a reason not to disrupt the status quo.
Goodhart’s law: when a measure becomes a target it ceases to be a good measure.
Summary
This chapter discusses the importance of setting a baseline to measure an application’s current performance across software delivery, operations, and user-centricity. However, these measures alone will not drive improvement. Teams need to assess their capabilities, experiment, and continuously improve. The chapter warns against pitfalls like comparing applications solely based on performance clusters, setting metrics as goals which encourages gaming, relying on a single metric, measuring only what’s easy vs meaningful, and using industry regulations as an excuse not to improve. Goodhart’s law states that when a measure becomes a target, it is no longer a good measure. Introduction
- Cluster analyses are performed annually to identify common trends across applications.
- Teams should use these analyses for understanding their performance, but avoid fixating on comparisons with other applications.
- The best comparisons are those made over time for the same application.
- Teams that focus on user needs are better equipped to build the right software in the right way.
Results Software delivery performance is assessed using the following measures:
- Change lead time: Time taken for a change to go from committed to deployed.
- Deployment frequency: How often changes are pushed to production.
- Change failure rate: How often a deployment causes a failure requiring immediate intervention.
- Failed deployment recovery time: Time taken to recover from a failed deployment.
Measure | Description |
---|---|
Change lead time | Time taken for a change to go from committed to deployed |
Deployment frequency | How often changes are pushed to production |
Change failure rate | How often a deployment causes a failure requiring immediate intervention |
Failed deployment recovery time | Time taken to recover from a failed deployment |
Reducing the batch size of changes for an application is a common approach to improve all four measures. Smaller changes are easier to reason about, move through the delivery process, and recover from if there’s a failure. Teams should aim to make each change as small as possible to ensure a fast and stable delivery process. This approach contributes to both change velocity and change stability.
Summary: This page discusses the importance of cluster analyses in understanding software delivery performance trends across applications. It emphasizes the significance of focusing on user needs and reducing the batch size of changes to improve key performance measures such as change lead time, deployment frequency, change failure rate, and failed deployment recovery time. By making changes as small as possible, teams can enhance both the speed and stability of their delivery process.
Performance level | Deployment frequency | Change lead time | Change failure rate | Failed deployment recovery time | % of respondents |
---|---|---|---|---|---|
Elite | On demand | Less than one day | 5% | Less than one hour | 18% |
High | Between once per day and once per week | Between one day and one week | 10% | Less than one day | 31% |
Medium | Between once per week and once per month | Between one week and one month | 15% | Between one day and one week | 33% |
Low | Between once per week and once per month | Between one week and one month | 64% | Between one month and six months | 17% |
This table shows the software delivery performance of the survey respondents, categorized into four levels: Elite, High, Medium, and Low. The performance levels are based on four key metrics:
- Deployment frequency: How often the organization deploys code changes.
- Change lead time: The time it takes from code commit to running in production.
- Change failure rate: The percentage of changes that result in degraded service or require remediation.
- Failed deployment recovery time: The time it takes to restore service when a change fails.
Elite performers deploy on demand, have less than one day change lead time, a 5% change failure rate, and recover from failures in less than an hour. On the other hand, Low performers deploy between once per week and once per month, have a change lead time of one week to one month, a 64% change failure rate, and take between one to six months to recover from failed deployments.
The majority of respondents fall into the High (31%) and Medium (33%) categories, while 18% are Elite performers and 17% are Low performers. Here is the converted markdown with a summary at the end:
Operational performance
We assessed operational performance by asking respondents how frequently their service does the following:
- Receives reports about end users being dissatisfied with the reliability of the system.
- Is unavailable, performs slower than expected, or performs incorrectly.
For an exploration of how operational performance predicts organizational performance, see Chapter 5 - Reliability unlocks performance.
User-centricity
A user-centric application or service is built with the end user in mind. Building a product like this requires a good sense of what users need and incorporating that into the product’s roadmap. We assessed respondents’ user-centricity by asking them the extent to which the following are true:
- Their team has a clear understanding of what users want to accomplish.
- Their team’s success is evaluated according to the value they provide to their organization and to the users of the application.
- Specifications (for example, requirements planning) are continuously revisited and reprioritized according to user signals.
Here’s a view into how this year’s survey respondents are doing with user-centricity:
Here’s a view into how this year’s survey respondents are doing with operational performance:
Accelerate State of DevOps 2023 v. 2023-12
Summary
This page discusses two key metrics used to assess the performance of software development teams: operational performance and user-centricity.
Operational performance measures how often a service has reliability issues or performs poorly. User-centricity evaluates how well a team understands and prioritizes user needs when building applications.
The data shows the distribution of scores across respondents. For user-centricity, the middle 50% scored between 5.6 and 8.3 out of 10. For operational performance, the middle 50% scored between 5 and 7.5 out of 10.
Understanding these metrics can help teams identify areas for improvement in delivering reliable, user-focused software. The full report explores how these factors ultimately impact an organization’s overall performance.
Team Type | Software Delivery Performance | Operational Performance | User-Centricity |
---|---|---|---|
User-centric | 7.5 | 7.5 | 7.5 |
Feature-driven | 5.0 | 5.0 | 5.0 |
Developing | 2.5 | 2.5 | 2.5 |
Balanced | 10.0 | 10.0 | 10.0 |
Summary: This page discusses four types of teams based on their software delivery performance, operational performance, and user-centricity. The team types are User-centric, Feature-driven, Developing, and Balanced. The table shows the mean scores for each team type across the three dimensions on a scale of 0-10. The GOAT diagram visually represents where each team type falls in relation to the three dimensions. Balanced teams score the highest across all dimensions, while Developing teams score the lowest. User-centric and Feature-driven teams fall in between, with User-centric teams leaning more towards operational performance and user-centricity, and Feature-driven teams leaning more towards software delivery performance. Here is the converted text in markdown format:
Think of the performance metrics we’ve been discussing as dials that an organization or team can adjust to change the organizational performance, team performance, and the well-being of the individuals on the team.
The graphs below show the performance outcomes predicted by each team type.
Each team type has unique characteristics, makes up a substantial proportion of our respondents, and has different outcomes. Your own team likely does not fit cleanly into only one, nor would we expect your team type to remain constant over time.
How do you compare?
Team type | Predicted burnout | Predicted job satisfaction |
---|---|---|
User-centric | 5.0 | 6.5 |
Feature-driven | 5.5 | 6.0 |
Developing | 6.0 | 5.5 |
Balanced | 6.5 | 5.0 |