“Those who cannot remember the past are condemned to repeat it.”
-George Santayana, Harvard Philosopher
Did you know that some software development techniques have been around since the 1950s? In the past decade, however, teams have found a strong desire to move away from traditional project management frameworks like Waterfall to Agile methodologies. These Agile techniques focus more on continuous small code component development and deployments versus large milestone-based Waterfall methodologies. Some Agile coaches place a heavy focus on the positive outcomes of this work, which will lead to accomplishment-focused meetings and retrospectives that are ceremonious rather than actionable. This puts them at a disadvantage when it comes to a crucial aspect of agile health — change.
Root Cause Analysis, the vital study of software issues and their origin, is an important tool that organizations tend to overlook. There is no better methodology to learn from your mistakes than to break them down and understand them. The natural inclination is to avoid this awkward exposition of past blunders, but there are some simple best practices that will make this a good experience. The goal of RCA is not to establish blame, but to learn from, and therefore avoid, making the same errors more than once. Organizations truly adept in RCA can even leverage it to avoid new mistakes. It’s hard to focus on our errors and weaknesses, but what can be derived from that exercise is essential to an organization’s ability to grow.
Companies deploy unit tests and a quality assurance team to catch mistakes. If you look at where the QA sits in the process, it’s at the very end of the conveyer belt. They watch what comes out, pile up the defects, and send them one step back in the workflow to be fixed. Ask any QA “do you ever see the same bugs more than once?” or “have you ever missed a bug?” and you’ll start to see how there are cracks in the system. The best QA in the world will never find every bug. They are not set up to succeed. They are set up to improve quality, but they are not set up to avoid risk.
The best way to avoid risk is a team dedicated to Root Cause Analysis.
A good Root Cause Analysis team does not meet on a recurring basis. They meet when necessary: when a live issue causes a financial loss, or an issue is client facing, or customers are frustrated. Each company should establish their own threshold so that it’s not done EVERY time there’s a bug in the live environment, but also not so infrequently that team members stress when they get a meeting notice with “Root Cause Analysis” in the subject line.
The team should be a combination of organizational administrative roles and technical resources. They need to be able to drive the conversation by continuing to ask “how can we stop this from happening again?” by maintaining a level of expertise that truly allows the team to get to the root of the problem. This is extremely important for determining the nature of the work. The team is there to drive change that establishes the avoidance of risk, not assign blame. When team members are fearful of ramifications, they will not be transparent. Avoiding risk is about evolving. Evolution requires team members to change and more importantly, know that their leaders support and believe in their ability to learn. A “criminal investigation unit” will not get the result you are looking to achieve when you are out to establish blame. When you want to evolve as an organization, everyone inside and outside of the team must be bought in that mistakes are an opportunity for learning.
The team should have avenues to inform or train team members. They should be able to make the call distinguishing between a knowledge gap and a rare miss. They should have the support of work management software administrators, who can help them add or change workflow gaps. They should have the respect and buy-in of leaders. These leaders should be prepared to help engineer change by believing that what the RCA team is doing will have a lasting impact on company culture and quality, as opposed to the fear of reprisal or dismissal.
The tools available to the team are incredibly important. What is the organization’s source of truth? If all questions and answers lie in a tool like Jira, then the team should start here. The organization’s source of truth illuminates questions, but does not necessarily answer them, members of an RCA team will need to do some discovery work between meetings. The discovery work should be tracked along with the desired takeaways. The team itself should be able to look to improve by revisiting issues when necessary. Additionally, performance tracking tools, marketing tools, and system uptime monitors will be necessary and should have subject matter experts represented within the team.
The team’s ability to evolve to solve new and varied problems is incredibly important. As issues are solved, new and more complex issues will arise as a natural part of an organization’s evolution. The team should be comfortable gaining and losing members based on the skill sets necessary to gain visibility into a variety of problems, availability, and fresh perspectives. They should be an autonomous, pragmatic success-based group that grows and changes based on the frequency and nature of problems they are facing.
There is no way to improve without understanding your faults. There is no adapting to the future without changing what isn’t working. Most importantly, there is no fulfilling corporation that does not look openly and honestly at what it’s doing wrong so it can change for the better. An honest, transparent workplace that learns from its mistakes is the only way to evolve.