Twenty-eight Americans were killed on February 25, 1991 when an Iraqi Scud hit the Army barracks in Dhahran, Saudi Arabia. The Patriot defense system had failed to track and intercept the Scud. What was the cause for this failure?
The Patriot defense system consists of an electronic detection device called the range gate. It calculates the area in the air space where it should look for the target such as a Scud. To find out where the Patriot missile should be next, it calculates its location based on the velocity of the Scud and the last time the radar detected the Scud.
In the Patriot missile, time was saved in a fixed point register that had a length of 24 bits. Since the internal clock of the system is measured every one-tenth of a second, 1/10 expressed in a 24 bit fixed point register is 0.0001100110011001100110011 (the exact value of the representation 0.0001100110011001100110011 of 1/10 in the 24-fixed point register is 209715/2097152) . As we can see that this is not an exact representation of 1/10. It would take infinite numbers of bits to represent 1/10 exactly. So, the error in the representation is (1/10-209715/2097152) which is approximately 9.5E-8 seconds.
On the day of the mishap, the battery on the Patriot missile was left on for 100 consecutive hours, hence causing an inaccuracy of 9.5E-8x10x60x60x100=0.34 seconds (10 clock cycles in a second, 60 seconds in a minute, 60 minutes in an hour).
The shift calculated in the range gate due to the error of 0.342 seconds was calculated as 687m. For the Patriot missile defense system, the target is considered out of range if the shift is more than than 137m. The shift of larger than 137m resulted in the Scud not being targeted and hence killing 28 Americans in the barracks of Saudi Arabia.
When I started looking at the Google search results of the problem, I found some very useful resources that would be of interest to the reader. These go beyond the above given simplistic explanation of the problem and tell the story behind the story. Here they are
- This reference is the full GAO report of the investigation that resulted after the accident. “Patriot Missile Defense – Software Problem Led to System Failure at Dhahran, Saudi Arabia”, GAO Report, General Accounting Office, Washington DC, February 4, 1992.
- It should be pointed out that the Patriot missile was originally designed to be a mobile system and not used as a anti-ballistic system. In mobile systems, the clocks are reset more often. As per the article Operations: I Did Not Say You Could Do That! by Bill Barnes and Duke McMillin, here are some important observations: “It turns out that the original use case for this system was to be mobile and to defend against aircraft that move much more slowly than ballistic missiles. Because the system was intended to be mobile, it was expected that the computer would be periodically rebooted. In this way, any clock-drift error would not be propagated over extended periods and would not cause significant errors in range calculation. Because the Patriot system was not intended to run for extended times, it was probably never tested under those conditions—explaining why the problem was not discovered until the war was in progress. The fact that the system was also designed as an antiaircraft system probably also enabled the inclusion of such a design flaw, because slower-moving airplanes would be easier to track and, therefore, less dependent upon a highly accurate clock value.”
A student asked me why we did not use a clock cycle that could be represented exactly in the 24 bit register. Close to 1/10 is a number 0.125 that can be represented exactly as 0.001000000000000000000000 in a 24-bit register, and where 8 clock cycles would be equal to 1 second. I do not have an answer to this question but I intend to find out from my computer science colleagues.
This post brought to you by Holistic Numerical Methods: Numerical Methods for the STEM undergraduate at http://nm.mathforcollege.com
Very surprising how a negligible error can cause such a massive destruction.
I am a computer engineer and I know how easy and common it is to ignore such an error. I also lived a civil war and I know what a single SCUD missile can do.
This article is one of my favorites.
Sad to say, not so surprising really. Remember, the error was not in the computer. At its root, the error was human. I had followed the story in the 90’s but had never heard that the system was originally intended for aircraft.
Unfortunately, re-purposing equipment like that often has over-sight involved, and the use of computers in direct warfare, especially at that time, was still in its infancy. This was at a time when people were just discovering that there may be serious ramifications from would come to be known as the Y2K bug, a simple attempt at saving 2 bytes of memory cost millions of dollars in re-programming efforts.
It’s the details that seem to get us in the end, including the loss of Challenger in ’86. Who knew that it would get cold enough in Florida to fracture a rubber joint seal? Some people knew, but none of them were SRB designers. We used flight hardware designed for warm weather in a cold weather environment.
More recently, we’ve seen structural failures of bridges in the mid-west, and now the worst environmental catastrophe in US history in the Gulf of Mexico, again because BP used a device designed for shallow drilling in a deep-water environment.
With luck, programs like USF’s Numerical Analysis website and STEM will set a stronger impression on our future engineers, and they will pay more attention to the details and not rely solely on the software.
Great article.
Unless I misunderstand what Mr. Simmons meant by the phrase “…use of computers in direct warfare.” I believe he is misinformed when he stated that that computers were just starting to be used at the time of the Patriot tragedy in 1992. Digital computers have been used on on Polaris and follow on generations of Poseidon and Trident submarines since 1960. Similarly, our fighter aircraft and other naval ships could not have functioned without computers. They did not look like PCs or an iPads but they did their job.
It is understandable with legacy software that Y2K was not designed for. Those systems needed every scrap of storage and other resources. What is inexcusable is that software developed during the last past of the 20th century when computer hardware had many more resources made no provision for Y2K. And you know who was the biggest culprit. And they still have not fixed a related problem. MS Excel still treats Year 1900 as a leap year when it was not. The only exception to the 4-year leap year rule is for years divisible by 100 unless they are divisible by 400. That’s why Year 2000 was a leap year. Who knows how Microsoft treats Year 2100.
Just so you know that I walk the talk, my company developed software for the Trident SSBN Navigation System around 1982. I was in charge of documenting the requirements. I had a lot of experience with problems associated with time particularly at year change over when software time counters had to be reset when the sailor entered the new year number. Of course this approach had to go. What I wanted was a continuous time counter that would never be reset from an agreed date. All time calculations would be based on requests from the Operating System. The most significant parameter ts time difference (TD), By doing it this was, all discontinuities in time are eliminated. We were assured that the TD would be done the same way. There are surprisingly little subtleties in this simple calculation that have proven to be prone to error. We also had the Operating System provide four digit year displays with correct leap years for years divisible by 100.
There are a few reasons that I have verbose about this subject. The first is to be careful when programing anything to do with time. I can.t think of a better way than this one to use on large systems. The first use of this kind of design that I’ve seen was in a celestial fix program on Polaris SSBNs in the 1960’s. Secondly,I like to emphasize how important experience is in Engineering. It seems that it sometimes is almost a forgotten quality.
I hope that a few people may have learned something from my comments. Tim Fitz
Unless I misunderstand what Mr. Simmons meant by the phrase “…use of computers in direct warfare.” I believe he is misinformed when he stated that that computers were just starting to be used at the time of the Patriot tragedy in 1992. Digital computers have been used on on Polaris and follow on generations of Poseidon and Trident submarines since 1960. Similarly, our fighter aircraft and other naval ships could not have functioned without computers. They did not look like PCs or an iPads but they did their job.
It is understandable with legacy software that Y2K was not designed for. Those systems needed every scrap of storage and other resources. What is inexcusable is that software developed during the last past of the 20th century when computer hardware had many more resources made no provision for Y2K. And you know who was the biggest culprit. And they still have not fixed a related problem. MS Excel still treats Year 1900 as a leap year when it was not. The only exception to the 4-year leap year rule is for years divisible by 100 unless they are divisible by 400. That’s why Year 2000 was a leap year. Who knows how Microsoft treats Year 2100.
Just so you know that I walk the talk, my company developed software for the Trident SSBN Navigation System around 1982. I was in charge of documenting the requirements. I had a lot of experience with problems associated with time particularly at year change over when software time counters had to be reset when the sailor entered the new year number. Of course this approach had to go. What I wanted was a continuous time counter that would never be reset from an agreed date. All time calculations would be based on requests from the Operating System. The most significant parameter ts time difference (TD), By doing it this was, all discontinuities in time are eliminated. We were assured that the TD would be done the same way. There are surprisingly little subtleties in this simple calculation that have proven to be prone to error. We also had the Operating System provide four digit year displays with correct leap years for years divisible by 100.
There are a few reasons that I have verbose about this subject. The first is to be careful when programing anything to do with time. I can.t think of a better way than this one to use on large systems. The first use of this kind of design that I’ve seen was in a celestial fix program on Polaris SSBNs in the 1960’s. Secondly,I like to emphasize how important experience is in Engineering. It seems that it sometimes is almost a forgotten quality.
I hope that a few people may have learned something from my comments. Tim Fitz
Do you mind if I quote a couple of your posts as long as I provide credit and sources back to your site?
My blog site is in the very same area of interest as yours and my users would genuinely benefit from a lot of the information you provide here.
Please let me know if this alright with you. Many thanks!
Sure.
Do you mind if I quote a couple of your posts as long as I provide credit and sources back to your site?
My blog site is in the very same area of interest as yours and my users would genuinely benefit from a lot of the information you provide here.
Please let me know if this alright with you. Many thanks!
Sure.
Good recap of the error!
Have you ever been able to find out why they didn’t use 1/8 of a second instead of 1/10?
Good recap of the error!
Have you ever been able to find out why they didn’t use 1/8 of a second instead of 1/10?