What is Failure Tracking?

Failure tracking is the practice of recording information about equipment failures. Any event in which equipment cannot perform its function within the limits and under the conditions required of it is considered a failure. The severity of a failure can range from potential to full failure.
APM uses the failure record’s date information to calculate statistics (such as time between failures) that measure equipment reliability and maintainability. Failure tracking also provides a way to measure the savings gained by avoiding failures. Failure tracking allows you to make informed decisions when targeting assets for improvement.
This topic provides an overview of failure tracking concepts.

Contents

Failure and Anomaly Records
Failure Severity
PF Intervals
Failures and Root Cause Analyses
Failure Follow-up Actions

Failure and Anomaly Records

In APM, a failure record can be created for each occurrence of an equipment failure or anomaly. The record documents the dates when the failure or anomaly occurred, was reported, and was resolved. The record includes the failure severity and, in the case of potential and partial failures, the PF interval: an estimate of the amount of time before a full failure occurs.
The record tracks the failure through to its resolution, documenting how the failure was identified (for example, indicator reading resulting in an alarm), the steps taken to avoid or correct the problem (for example, work order tasks), any delays that occurred (time needed to get parts), downtime incidents, and the savings achieved by avoiding a more severe failure.
Failure records can be created:
Note: In the Acknowledge Indicator Alarm dialog, the Create or link to a failure or anomaly option is selected by default when the alarm state requires that a failure record be created. When this option is selected, the Failure tab appears in the dialog. You can clear the option if needed, or you can select it even when the alarm state does not require a failure record.
Besides indicator alarm states, the acknowledgment policies in the site’s inspection management settings affect whether failure records are created. These settings specify which acknowledgment methods allow failure records to be created.
You can also create failure actions and record delays and downtime on failure records. Downtime incidents can be created automatically at the same time that failure records are generated.
When APM uses AssetWise Enterprise Interoperability (AWEIS) to exchange data with an external CMMS, failure records can be created or updated from the events associated with interop work requests, interop work orders, or both. For more information, see Overview of Work Management with AWEIS.

Failure Severity

A failure’s severity is used to indicate the seriousness of the failure. Three severity types are supported:
Potential: a condition that has been noticed that would result in an actual failure if the problem is not resolved. The actual failure has not yet occurred. Potential failures are normally reported as the result of an indicator reading that has raised a warning alarm against the asset.
Partial: the asset’s performance has decreased to a point where it is no longer performing one of its functions at the specified levels. The asset is still functioning.
Total: the asset’s performance has decreased to a point where the asset is no longer performing at its required level. The asset has completely failed.
You can create failure severities as required for your organization.

PF Intervals

PF interval is used to indicate the date when a potential or partial failure could escalate to a full failure if the problem is not resolved beforehand.
PF interval is usually tracked in units of time (hours, days, weeks, and so on). However, an asset that has a cumulative primary indicator can use a PF interval measured in indicator values, such as cycles or operating hours.
For time-based PF intervals, the following values are tracked on the failure record:
Original PF interval: The PF interval as of the date and time when the failure occurred.
PF Interval: The current PF interval if the original value has been changed manually.
Elapsed time: The time that has elapsed since the failure occurred.
Remaining PF interval: The time remaining until a full failure will occur. If the failure is not resolved beforehand, the remaining PF interval can be a negative value.
Expected failure date: The date on which the remaining PF interval will reach 0.
For PF intervals based on operating hours or cycles, the following values are tracked on the failure:
Original PF interval: The PF interval, as derived from the asset indicator alarm state as of the date when the failure occurred.
PF Interval: The current PF interval if the original value has been changed manually.
Elapsed value: The value that has accumulated against the indicator since the failure occurred.
Remaining PF interval: The value remaining until a full failure will occur. If the failure is not resolved beforehand, the remaining PF interval can be a negative value.
Expected failure date: The date on which the remaining PF interval will reach 0, based on the value at which the full failure will occur and the indicator’s average accumulation.
Expected failure value: The value at which a failure will occur. The value at which the remaining PF interval will reach 0.
Note: If you replace or remove an asset’s primary cumulative indicator after failure records have been created, the failure records are not affected by the change. They continue to use the original indicator for calculating failure statistics, P-F intervals, or both.

Recording the Initial PF Interval

The PF interval is recorded when the failure record is created. If the failure was created from an indicator alarm acknowledgment, the PF interval on the indicator state is copied to the failure record. The copied value can be adjusted to reflect your best guess for the time that elapsed from when the failure started to be noticeable to the time when the reading was taken.
For example, an indicator state has a PF interval of four months (120 days). The indicator is read at a frequency of every 60 days. When a failure is created from the indicator and state, the PF interval on the failure is prompted with a value of four months. Because the reading value indicates that the failure started to occur 20 days ago, you could manually adjust the PF interval to 100 days.

Updating Remaining PF Intervals

The time remaining for a time-based PF interval and the expected failure date are usually calculated by a scheduled action. (PF intervals can also be recalculated manually.) Every time that the action is run, the elapsed time and remaining interval are recalculated. The expected failure date remains constant. For best results, the scheduled action should be run daily or more often if PF intervals are less than one day.
The following table shows an example of a PF interval reported on June 1 for a partial failure that occurred on the same date. The PF interval is recalculated on the first of each subsequent month. Note that the elapsed time and remaining PF interval change, but the expected failure date does not.
Indicator-based PF intervals are recalculated each time a reading is entered for the indicator. For example, a failure is reported on June 1 with a PF interval of 100 cycles. At the time of the reading, the asset’s cycle count is 15,000. Based on the PF interval and the indicator’s average usage of two cycles per day, an expected failure date of July 20 is calculated.
The next reading of the indicator is entered on June 15. The reading on this date is 15,040 cycles. The asset has been consuming cycles at a faster rate than average. This results in a new expected failure date of July 15. Note that the expected failure at 15,100 cycles has not changed. This value is fixed, but the date on which the value is expected to be reached changes.

Failures and Root Cause Analyses

You can use APM to evaluate a failure’s suitability for root cause analysis, based on the failure’s severity, consequence priority, and probability of recurring. APM calculates the criticality index as follows:
Criticality Index = Consequence priority * Failure Severity * Probability
The value of the criticality index in turn determines whether RCA is required, recommended, or not required.
If the evaluation determines that RCA is warranted, you can create the analysis from the failure record or request that an RCA be performed.

Failure Follow-up Actions

You can define failure follow-up actions, for example, a mitigation action to provide a temporary solution until the failure can be analyzed and resolved. Each action includes a sequence number, action type, description, due date, employee assignment, and action status. The document thus provides a record that can be tracked by due date, status, and owner.
Each failure follow-up action is categorized by type:
The objects that you add from follow-up actions reference the action. For example, mitigation properties include a link to the source follow-up action.
You can view a list of failure follow-up actions for the site. In the Performance Management view, select the Failures tab. Select “Failure actions by due date” in the configuration list.