The DO Experiment Significant Event System
S. Ahn, S. Fuess (presenter), J.F. Bartlett,
S. Krzywdzinski, L. Paterno
Fermi National Accelerator Laboratory
L. Rasmussen
State University of New York, Stony Brook
The DO Experiment at Fermilab is finishing approximately three years of
data taking at the Tevatron. During this time a Significant Event System has
played an integral part in data acquisition operations, managing the receipt
and distribution of alarms, heartbeats, and run state transitions.
The heart of the Significant Event System is the Alarm Server application,
resident on a host VAX-VMS cluster. The Alarm Server receives alarms from
front end processors, various control and monitoring path applications, and
data path applications. The front ends are IBM PC and Motorola 68000 family
front end processors connected via a Token Ring network. A Microvax running
ELN operates as a Gateway between Token Ring and Ethernet. The Gateway
application performs appropriate byte swapping on messages which traverse it,
according to a data format block specified by the message passing protocol.
Alarm messages are broadcast by the front ends using a group functional
address, which the Gateway recognizes and forwards via DECNET to the Alarm
Server.
Approximately 30 critical applications send periodic heartbeats to the Alarm
Server. If a timely heartbeat is absent then an alarm will be internally
generated. Data acquisition state information is also distributed through the
Alarm Server.
The Alarm Server then distributes information to various instances of
specialized Display and Logging processes. The Alarm Display is an updating
MOTIF display which categorizes the current system state. Active buttons on
the display allow a user to obtain more information, either internally
contained within the significant event messages or from associated database
entries. The Alarm Logger writes a sequential file of significant event
information which can later be queried with a SQL-like browser.
An important component of the Significant Event System is an application which
receives both data acquisition and alarm state information. This application,
upon indication of a severe fault in a critical experiment component, instructs
the data acquisition system to interrupt data flow. Once corrective action is
taken and data acquisition resumed, then logged downtime information can be
used to track the frequency of problems.
The current system is built with PASCAL and FORTRAN applications using a
Client/Server package layered over a DECNET communications package. We will
discuss our experience in the operation of this system and plans for future
upgrades.