Therac-25 software development and design

We know that the software for the Therac-25 was developed by a single person, using PDP 11 assembly language, over a period of several years. The software "evolved" from the Therac-6 software, which was started in 1972. According to a letter from AECL to the FDA, the "program structure and certain subroutines were carried over to the Therac 25 around 1976."

Apparently, very little software documentation was produced during development. In a 1986 internal FDA memo, a reviewer lamented, "Unfortunately, the AECL response also seems to point out an apparent lack of documentation on software specifications and a software test plan."

The manufacturer said that the hardware and software were "tested and exercised separately or together over many years." In his deposition for one of the lawsuits, the quality assurance manager explained that testing was done in two parts. A "small amount" of software testing was done on a simulator, but most testing was done as a system. It appears that unit and software testing was minimal, with most effort directed at the integrated system test. At a Therac-25 user group meeting, the same quality assurance manager said that the Therac-25 software was tested for 2,700 hours. Under questioning by the users, he clarified this as meaning "2,700 hours of use."

The programmer left AECL in 1986. In a lawsuit connected with one of the accidents, the lawyers were unable to obtain information about the programmer from AECL. In the depositions connected with that case, none of the AECL employees questioned could provide any information about his educational background or experience. Although an attempt was made to obtain a deposition from the programmer, the lawsuit was settled before this was accomplished. We have been unable to learn anything about his background.

AECL claims proprietary rights to its software design. However, from voluminous documentation regarding the accidents, the repairs, and the eventual design changes, we can build a rough picture of it.

The software is responsible for monitoring the machine status, accepting input about the treatment desired, and setting the machine up for this treatment. It turns the beam on in response to an operator command (assuming that certain operational checks on the status of the physical machine are satisfied) and also turns the beam off when treatment is completed, when an operator commands it, or when a malfunction is detected. The operator can print out hard-copy versions of the CRT display or machine setup parameters.

The treatment unit has an interlock system designed to remove power to the unit when there is a hardware malfunction. The computer monitors this interlock system and provides diagnostic messages. Depending on the fault, the computer either prevents a treatment from being started or, if the treatment is in progress, creates a pause or a suspension of the treatment.

The manufacturer describes the Therac-25 software as having a stand-alone, real-time treatment operating system. The system is not built using a standard operating system or executive. Rather, the real-time executive was written especially for the Therac-25 and runs on a 32K PDP 11/23. A preemptive scheduler allocates cycles to the critical and noncritical tasks.

The software, written in PDP 11 assembly language, has four major components: stored data, a scheduler, a set of critical and noncritical tasks, and interrupt services. The stored data includes calibration parameters for the accelerator setup as well as patient-treatment data. The interrupt routines include

The scheduler controls the sequences of all noninterrupt events and coordinates all concurrent processes. Tasks are initiated every 0.1 second, with the critical tasks executed first and the noncritical tasks executed in any remaining cycle time. Critical tasks include the following:

Noncritical tasks include

It is clear from the AECL documentation on the modifications that the software allows concurrent access to shared memory, that there is no real synchronization aside from data stored in shared variables, and that the "test" and "set" for such variables are not indivisible operations. Race conditions resulting from this implementation of multitasking played an important part in the accidents.