Technical Blog
Putting the Rational in Alarm Rationalization
Alarm Rationalization is the process of reviewing the process control system alarms in a unit, deciding which are necessary and how to prioritize them. While this is a simple exercise, it can be very time-consuming. Most clients have a clear understanding of their objectives – alarms should be used to get the attention of an operator, and a deviation should only be alarmed when it requires operator action to correct. Often there will be a series of rules used to determine pre-established priorities for many of the utility or diagnostic alarms, and these can be quickly categorized and prioritized. The challenge comes when you must handle the large number of operational and quality alarms which remain after the rules-based alarms have already been settled.

For most control systems there are three levels of alarm priority, typically called something like Low, High, and Urgent (we will use these three for consistency). The purpose of alarm priorities is to give the board operator some guidance on which issue to address first. If he only gets one alarm, he only has one problem and can hopefully fix it before the next alarm. But when he gets several, the priority system exists to help him decide which to address first. Many alarm rationalization processes have inadequate guidance for the team other than engineering judgement or suggest that alarms be prioritized based only on the severity of the resulting event. We believe there is a second factor to consider, and that is the time delay between the alarm sounding and the undesirable event occurring.
This definition is critically important. It is not how long it will probably take the operator to respond, or how quickly management expects the operator to respond, but the literal process safety time – how long it takes for the event to occur if the alarm is ignored by the board operator, and things progress to their inevitable and undesired end. Consider two events, both of which can result in a loss of primary containment and a potentially fatal fire or explosion. If event A can progress to a fire within 15 minutes of the alarm sounding, while event B will result in a fire after an 8-hour delay, it should be clear that you would prefer the operator address the 15-minute event first.
A good alarm rationalization approach should therefore use a matrix crossing event severity (typically five categories from worst to most minor) with four or five time categories. We would recommend a logarithmic scale similar to what is used to define the severity axis of the matrix. Your safety time categories might be set at 1 minute, 10 minutes, 100 minutes (about 1.5 hours) and 1000 minutes (about a day), and you will need to develop a priority matrix which determines the alarm priority based on cross-referencing the severity and the process safety time. This will resemble a PHA severity vs event frequency matrix.
Evaluating each alarm using the matrix becomes a rational exercise in evaluating the severity of the event resulting from ignoring the alarm and how long it will take that event to happen. This is an objective and supportable methodology for evaluating the necessary priority of an alarm without involving subjective factors. Engineering judgement and operating experience are important contributors, but you need an objective basis for evaluation to reduce the impact of observer bias and the potential post-traumatic stress of previous incidents.

Facilitating an alarm management review is like PHA reviews with one essential difference. In a PHA review you are trying to identify safeguards and layers of protection, so everything is assumed to fail, and the result of a high-pressure deviation is an LOPC and fire or explosion. In alarm rationalization, everything is expected to work properly, and the result of a high-pressure deviation is the PSV lifting and a flare release, so the severity of the final event is often much lower than the worst day expected in a PHA review. The basic approach is to identify the event which occurs if the alarm is ignored, determine the severity, then decide how long until that event happens, and reference the matrix to determine the priority. The challenges are in the details, like deciding between a 10 minute and 100 minute process safety time. When trying to split the difference, remember that for a log scale the midpoint between 10 and 100 is about 32, so the question should be “more or less than 30 minutes”, and if it remains unclear, default to the shorter, and therefore more conservative interval.
Meet the Author
Eric Nussberger – Senior Process Safety Engineer

Eric is a senior process safety engineer with over 25 years of experience in process design and troubleshooting, project conception and development, and advanced process control. He specializes in Safe Operating Limits (SOL) development, hazard reviews, and control system optimization.
Related Cognascents Solutions
At Cognascents, we have the expertise to meet your needs. Explore the variety of related services we offer and discover how our tailored approach can support your business.
Our mission is to empower our clients with innovative solutions that enhance process safety, reliability, asset integrity, and technical business acumen. To elevate your engineering solutions, please contact us today.


