The DO Experiment Significant Event System

S. Ahn, S. Fuess (presenter), J.F. Bartlett,
S. Krzywdzinski, L. Paterno
Fermi National Accelerator Laboratory

L. Rasmussen
State University of New York, Stony Brook

The DO Experiment at Fermilab is finishing approximately three years of data taking at the Tevatron. During this time a Significant Event System has played an integral part in data acquisition operations, managing the receipt and distribution of alarms, heartbeats, and run state transitions.

The heart of the Significant Event System is the Alarm Server application, resident on a host VAX-VMS cluster. The Alarm Server receives alarms from front end processors, various control and monitoring path applications, and data path applications. The front ends are IBM PC and Motorola 68000 family front end processors connected via a Token Ring network. A Microvax running ELN operates as a Gateway between Token Ring and Ethernet. The Gateway application performs appropriate byte swapping on messages which traverse it, according to a data format block specified by the message passing protocol. Alarm messages are broadcast by the front ends using a group functional address, which the Gateway recognizes and forwards via DECNET to the Alarm Server.

Approximately 30 critical applications send periodic heartbeats to the Alarm Server. If a timely heartbeat is absent then an alarm will be internally generated. Data acquisition state information is also distributed through the Alarm Server.

The Alarm Server then distributes information to various instances of specialized Display and Logging processes. The Alarm Display is an updating MOTIF display which categorizes the current system state. Active buttons on the display allow a user to obtain more information, either internally contained within the significant event messages or from associated database entries. The Alarm Logger writes a sequential file of significant event information which can later be queried with a SQL-like browser.

An important component of the Significant Event System is an application which receives both data acquisition and alarm state information. This application, upon indication of a severe fault in a critical experiment component, instructs the data acquisition system to interrupt data flow. Once corrective action is taken and data acquisition resumed, then logged downtime information can be used to track the frequency of problems.

The current system is built with PASCAL and FORTRAN applications using a Client/Server package layered over a DECNET communications package. We will discuss our experience in the operation of this system and plans for future upgrades.