Overview of Failure Mode Risk Analysis

In the process of evaluating a failure mode, you can quantify the relative risk (criticality) associated with the failure by evaluating the consequences (severity of the effect), the probability of the failure occurring, and (optionally) the failure’s detectability, assigning values for each factor. APM then calculates the relative risk by multiplying the severity value by the probability value, and then by the detectability value, if it is used.
When the relative risk is established, APM calculates the failure mode’s consequence priority using a set of customer-defined rules. The consequence priority rules can be based on the failure mode’s severity, relative risk, downtime costs, downtime duration, or a combination. For example, the Extreme priority could be assigned to failure modes whose total severity is equal to 5.0.
After you have analyzed the failure modes, you can compare failure modes and identify the relative importance of addressing them. The Risk Assessment view in the Strategy Development Analysis window includes failure mode lists based on criticality, consequence priority, severity, and relative risk, as well as a risk plot, risk matrix, and lists of the evaluations. This view is also available for the asset.
APM provides two ways to perform failure mode risk analysis:
With both methods, APM calculates the relative risk and displays it in the risk matrix chart. The method available in the Maintenance Action Plan window depends on the option selected in the analysis’ risk analysis settings.
Before you can perform risk analysis, the probabilities, severities, evaluation forms, consequence priorities, confidence factors, and risk matrix entries must be set up in the site’s Strategy Development settings. For more information, see Risk Analysis Settings.
The following sections explain risk analysis concepts in more detail.

Risk (Criticality)

The risk number is calculated for the failure mode as the product of the Total failure mode consequence severity, probability of failure, and (optionally) the failure mode detectability.
Risk = Severity * Probability *Detectability

Severity

Failure severity measures the consequences when a failure occurs. Severity can be described in terms of health and safety, environmental, reputation, and economic categories and is usually described as:
An impact statement and numerical value are associated with each severity value defined in APM. The higher the number, the more severe the effect. Economic impact can also be associated with each severity value to help determine avoidance savings and maintenance feasibility.

Probability of Failure

The probability of failure is the likelihood that the asset will fail due to the failure mode. In RCM2 analysis, there are two choices for evaluating probability:
With either method, APM usually describes probability with terms like high, medium, low, or negligible.

Detectability

“Detectability” refers to the ability of the system or process to detect a hazardous event. Lower scores are used for failures that are easy to detect and higher scores for failures that are harder to detect. When support for detectability is enabled in an analysis, the relative risk calculation becomes:
(Economic severity score + Health and safety severity score + Environmental severity score + Reputation severity score) * Probability score * Detectability score = Relative Risk Number

Consequences

For each important failure mode, the severity of the consequences of failure must be determined. Consequences are categorized:
Each of these consequence categories is assigned a value. The sum of the values in all categories is used in the calculation of the consequence assessment.

Economic Consequences

The economic consequence of failure reflects the financial effect of the failure on assets and production. Labor and material costs associated with lost production and with repairing or replacing the damaged equipment are economic consequences.
Costs associated with health or safety or environmental consequences are not included as economic consequences. For example, clean-up costs incurred as a result of a spill are not considered as an economic consequence.

Health and Safety Consequences

Equipment failure can cause hazards in the workplace. Examples are extreme temperatures or pressure or noxious fumes. It is important to note that mitigating factors are considered when assessing health and safety consequences:
The following are examples of health and safety consequences:

Environmental Consequences

Environmental incidents are an important category to consider when determining overall consequence. Typically, two types of environmental incidents are considered:
Several levels of consequence evaluation are taken into account, as shown in the table below:

Reputation Consequences

Consequences to reputation measure the impact that negative media attention has on an organization’s ability to operate in good faith. Typically, the severity of bad press is evaluated in terms of how far-reaching it is and how long it takes to mitigate, as shown in the table below:

Failure Mode Consequence Priorities

APM calculates a consequence priority for the failure mode during risk analysis or for the failure during RCA evaluation. Consequence priorities allow you to rank and compare an asset’s failure modes and failures. In failure analysis, the consequence priority is used in the calculation that determines whether the failure is suitable for RCA.
The rules defined for a priority can be based on any of these properties:
Downtime cost – the total downtime cost of the failure mode or failure is used. The total downtime cost is the sum of the downtime occurrence cost and the downtime rate costs times the length of the downtime:
Downtime Cost = Downtime Occurrence Costs + (Downtime Rate * Downtime Duration)
Relative risk (risk analysis only)
Severity, which can include the sum, minimum, or maximum value for any or all of:
The failure mode is assigned the highest ranking consequence priority for which it satisfies the priority’s rules.
As an example, suppose a set of three consequence priorities. To simplify the example, the rules are based on a single property (total severity). In practice, the rules can be more complex and involve multiple properties and rule groups.
A failure mode with a severity of 14 is assigned the consequence priority High. Although it satisfies the rules for each of the consequence priorities, it is assigned High because that is the highest ranking priority.

Risk Matrix

Criticality or risk is defined as the combination of two parameters: the likelihood or probability of failure and the consequences of failure. The risk matrix displays this combination, typically with four or five criticality levels and four or five probability levels. For example:
This example contains the following probability, consequence, priority, and criticality values.

Probability of Failure

Possible values are:
Frequent (<1 year) – Very High (5.00)
Probable (1-3 years) – High (4.00)
Possible (3-10 years) – Medium (3.00)
Unlikely (10-30 years) – Low(2.00)
Remote (> 30 years) – Very low (1.00)

Consequence Categories

Possible values are:

Consequences

Possible values are:
Negligible – Any of (Environmental, Operational, Safety, Reputation) Is equal to 1.00
Low – Any of (Environmental, Operational, Safety, Reputation) Is equal to 2.00
Medium – Any of (Environmental, Operational, Safety, Reputation) Is equal to 3.00
High – Any of (Environmental, Operational, Safety, Reputation) Is equal to 4.00
Severe – Any of (Environmental, Operational, Safety, Reputation) Is at least 5.00

Criticality Rating

The criticality that is selected is the most severe combination of consequence priority and probability. For example, the Medium-high (MH) criticality is selected when the consequence priority is High and the probability of failure is Medium.
The ratings require corresponding responses. Negligible criticality can result in no formal inspection plan. Failure modes with low to medium-high criticalities usually require inspection plans. High and extreme criticalities can call for detailed analysis that can result in equipment redesign or further mitigating factors.

Risk Plot Chart

Risk plot charts provide a visual representation of an asset’s unmitigated (or initial) risk based on severity (consequence) and probability of failure. The risk value is determined by asset prioritization or strategy development analysis. If a feasibility evaluation is performed on a failure mode, the chart also shows the mitigated (or residual) risk.
The risk plot chart can appear in the following locations:
Asset Prioritization Analysis window, Worksheet and Summary views
Maintenance Action Plan window, Criticality and Feasibility views (MTA2, RCM2, RBI analyses)
Strategy Development Analysis window, Risk Assessment view, Risk Plot tab
Risk plot lines indicate risk tolerance levels to provide context for analysis results. Tolerance levels can be shown as a single plot line or as color areas. Here is an example of a chart that shows a single risk plot line:
Risk plot charts support as many as five color areas (named Extreme, High, Medium, Low, Negligible). The default colors for the areas provided with APM are purple, red, orange, yellow, and green, respectively. Here is an example of a chart from a strategy development analysis: