Blog

Root Cause Analysis

Investigating in Brazil
David Ramsay

Root Case Analysis is one of the most misunderstood terms. So many companies and individuals request Root Cause Analysis (RCA) training when what they actually mean is that they would like to learn how to investigate incidents, accidents and failures.

This point was reinforced for me when recently visiting an oil and gas sector company in the USA. The safety professionals in this large organisation used a competing methodology to TOP-SET®, which they described as a Root Cause Analysis tool. They said that they didn't have an investigation system. In my view, they were quite correct; there is a major difference between doing a full investigation to gather all of the facts followed by an analysis, and going straight to the RCA without all of the necessary information. If you do that, you can have no confidence in the results and in the subsequent recommendations.

Also, quite recently, I have seen very large and detailed engineering fault trees based on 'What if' scenarios. Sadly the person who commissioned these had confused them with RCA diagrams that he actually needed.  He needed RCAs based on the facts rather than the possibilities.

In the view of all of my senior investigation team, the most difficult part of the RCA is getting started. This starting point is the identification of the immediate causes. Once these are in place every other step seems to follow quite easily. However, get these wrong, then the whole diagram and associated recommendations will also be incorrect.

Some of our lead investigators use the term 'Loss of Control' to help them identify the immediate causes. What they are really doing is asking themselves the question 'What were the factors, from the information that I already have, that caused control of the situation to be lost?'

If we use the sad but common example of a young person (usually a male) who is killed by crashing his car, you will find that he lost control of his vehicle by the common factors of 'lack of experience' and 'excessive speed in the prevailing conditions' - these are the immediate causes. You can apply the same approach in many situations. In one industrial investigation, I started from the position of process control being lost to help me establish the immediate causes.

Probably a better term to use than immediate cause, is 'Active Failure' as this leads you to consider what activity or activities occurred to enable the incident to happen. There always has to be an action.

Consider one of the most frequent incidents that we hear about - person hit by a falling object. We will find that we have 'the falling object or load' and 'the person standing or working under the load'. Both of these are active events without which the incident could not have occurred. Take one away and that particular incident becomes impossible.

Two of us were investigating an incident where a pleasure craft had an engine failure at sea. It took a lot of effort to not just focus on the engine failure and instead to consider that the real incident was the risk to life from being in a drifting boat. By focusing on the engine failure only, we omitted another crucial aspect of the incident, and that was the lack of any alternative propulsion unit on the boat. The Active Failures were then seen as 'Engine Failure' and 'Loss of Propulsion'. This enabled us to explore and illustrate what steps led to the failures and what would have prevented them occurring.

It perhaps useful to think of an RCA as a 'Logic Flow Diagram' where one step follows logically from another. Ideally your diagram should enable the reader to understand precisely what happened without reference to any other document. Of course it is necessary to have a full report, but a well presented RCA with competent recommendations can often be all that a senior manager will actually read.

The next level down in the RCA is 'Underlying Causes'. Another way of thinking of these is that they were the 'General Conditions' that enabled the incident to occur. The causes are usually expressed as a short series of interdependent facts. Using one of the 'Active Failures' from the above boat example:
Loss of propulsion. Why? No backup engine. Why? Owner's decision. Why? Shortage of storage space for outboard engine and Weight concerns.

For anyone who has ever done factory, ship or oil rig visits, the idea of 'General Conditions' is fairly easy to understand. What you see are the indicators of how the operation is being run and hence the likelihood of it being a good, bad or safe operation, e.g. just think of the clue that poor housekeeping gives you.

Root Causes are what lie at the heart of the problem. This is what allowed the situation to arise in the first place and this will be in the areas of People and Organisation. It is here that the actions, or more often inactions, that led to the 'General Conditions' can be found. It is these Root Causes and General Conditions that enabled the incident to occur. Returning to the failed propulsion in the pleasure boat example, the Root Cause was that the owner valued speed, space and performance above safety at sea.

Some words of warning. Do be rigorous in identifying the 'Root Cause'. Everything can come down to management, but try to be specific. Also don't be seduced into thinking that because you have identified the technical failures you have found the 'Root Cause'; you haven't. Please recognise that this must be somewhere in the People and Organisational sectors.

As in all parts of the investigation do your preparatory work on paper by using 'Stickies' or 'Post its' before entering it in the computer as this will enable sharing and discussion with your team. This is an essential component in any investigation and subsequent analysis.

The final step is to write the recommendations or S.M.A.R.T. Actions and these should be tied back to specific facts on the RCA. For those of you using our 'Investigator 3' software there is a specific number-tagging element that links the RCA to the recommendations in the final report.