That giant leap Neil Armstrong made fifty years ago might have been a small step for him, but in terms of managing risk, setting foot on the Moon represented the greatest technical achievement in human history. Consider the physics involved in getting to the Moon and back, the hazards encountered by placing a human in that kind of environment, and the state of the art – to be precise, the state of science and technology – in the 60’s, the magnitude of the accomplishment is breathtakingly apparent.
Five decades removed, what isn’t apparent was the enormous pressure to produce those results. There was a space race on, and the Soviet Union was already up there. In 1961, two weeks after the US managed to put its first astronaut into a sub-orbital flight. President Kennedy set the lofty goal of sending a man to the moon and safely returning by the end of the decade. From that point on, space exploration was a lead story for the rest of the decade.
In the space of eight years, the scientists, engineers, and managers expected to convert that goal into reality did exactly that. It is an amazing story!
Moreover, in that first decade of the US manned spaceflight program, no human life was lost in space. In the history of managing risk, there is no greater success story.
That’s not so say no American lives were lost in the space race. In 1967 the three member crew of Apollo 1 – Gus Grissom, Ed White, and Roger Chaffee – were killed during a training exercise. On the launch pad, strapped in their seats, in a 100% oxygen atmosphere, pressurized to 16.7 PSI, fire broke out inside their capsule. Rescue proved impossible, in large part because 28 latches sealed the entry port and the fire spread rapidly in the oxygen atmosphere.
Think there might some important lessons about managing risk to be learned from all of this?
Read on!
Lesson Number One: Understand Hazard, Risk, and Consequences Properly
Managing risk effectively begins by understanding risk properly. The best way to understand risk is the simplest: a hazard is a source of danger; something capable of producing human harm. Risk is the measure of probability that a hazard actually produces harm. The degree of human harm – consequences – is found when an event happens.
It is that simple. This is not rocket science. Making things more complicated only serves to confuse: confusion does not advance the cause of managing risk and keeping people safe.
Applying this simple set of definitions to spaceflight, an unmanned rocket has no hazards. (Unless that rocket lands on somebody’s car or house.) Putting the human on the nose of a rocket is what creates the hazard. Doing that produces an extremely long list of hazards, many of which could prove fatal.
Calculating risk is a simple matter of figuring out probabilities: What are the odds that any given hazard produces an event? That said, calculating hazard probability accurately is hardly easy. But, like travelling to the Moon, doing something that is hard can be important.
Given a proper understanding of hazard and risk, would it be correct to assume that when a flight crew is on the ground, buckled in their pressurized and oxygenated capsule, practicing the countdown for their mission, there are no hazards? There is no risk? There can be no event, most certainly not one with serious, or even fatal consequences?
History proved those assumptions to be wrong on all accounts.
Mistake Number Two: Never Assume “That will never happen.”
On the eve of the fiftieth anniversary of the Apollo Fire, Air and Space Magazine published an in-depth look at the tragedy, and the effect it had on the space program. Viewed in retrospect, the hazards were so obvious: fuel, oxygen….all that was missing was a heat source to spark the Fire Triangle. Think all those cabin electronics were a ready source? When NASA’s root cause investigation went public, one leading newspaper criticized the agency for putting three astronauts “into what even a high school chemistry student would know was a potential oxygen incendiary bomb, one needing only a spark to initiate catastrophe.”
The thing was, the people who put those three astronauts in harm’s way were the best and brightest technical and managerial talent on the planet.
Moreover, it wasn’t like they hadn’t been told about the potential hazard. Months earlier, the Apollo Spacecraft Program Manager had been issued a warning letter from their independent safety experts: “I do not think it technically prudent to be unduly influenced by the ground and flight success history of Mercury and Gemini under a 100 percent oxygen environment. The first fire in a spacecraft may well be fatal.”
Read that as “hazard, risk, and the consequences of an event.”
Assuming “that will never happen” assigns a probability of zero to the hazard. If it can’t happen – or won’t happen – no need to invest any thought into the potential consequences if it does happen.
Lesson Number Three: Never Compromise Your Responsibilities
The Monday after the tragedy, Gene Kranz – famous later in his career as the Flight Director on shift when Apollo 13 had its problem – called his team together and gave one of the greatest Safety Stump Speeches in the history of Safety Stump Speeches. It became known as “The Kranz Dictum: Tough and Competent”:
“Tough means we are forever accountable for what we do or what we fail to do. We will never again compromise our responsibilities…. Competent means we will never take anything for granted. We will never be found short in our knowledge and in our skills.… When you leave this meeting today you will go to your office and the first thing you will do there is to write “Tough and Competent” on your blackboards. It will never be erased. Each day when you enter the room these words will remind you of the price paid by Grissom, White, and Chaffee.”
Suffer a serious event – like Apollo 1 – and there’s bound to be an investigation. These days when investigating, some organizations are inclined to tread lightly on human failure, including the failures of those humans who were in charge at the time. That’s not what happened with the Apollo Fire: one engineer described the investigation – done internally – as “the most excruciating technical dissection of a machine I could ever imagine.”
People involved prayed the fault would not be found with their equipment or their work: they didn’t want to find out they were responsible for the tragedy. Who could blame them for thinking that?
But that’s not how good scientists, engineers and managers really think about their responsibilities. Be the one responsible for causing the event, what “they find” doesn’t matter nearly as much as what “you know.” If you’re responsible, you’ll know plenty.
Case in point: Kranz opened that now famous “Tough and Competent” speech, telling his team:
“Somewhere, somehow, we screwed up. It could have been in design, build, or test. Whatever it was, we should have caught it. We were too gung ho about the schedule and we locked out all of the problems we saw each day in our work…….Not one of us stood up and said, ‘Dammit, stop!’
I don’t know what Thompson’s committee will find as the cause, but I know what I find. We are the cause!”
The Monday after the tragedy, Kranz might not have known all the details as to how the event happened, but he was absolutely sure why it happened.
Lesson Number Four: Appreciate What The Leader is Really Responsible For
Consider the plight of the line manager in charge of the Apollo spacecraft program. Aka, the boss. Joe Shea, was his name: described as “brilliant” but also “arrogant.” Shea was the one who’d been warned about the hazard, the risk and the potential consequences. He just didn’t do much of anything to manage the risk.
After the fire Shea “fell into a deep depression, suffering what some have called a breakdown. As he later wrote, wandering the gardens at Washington’s Dumbarton Oaks, ‘alone with a life I wished had ended with the three [astronauts].’”
Even after he left NASA to return to private industry, the accident tormented him. He would sit in his den at night, going over the events in his mind again and again. Shortly before the plugs-out test, Gus Grissom had asked him to join the astronauts in the spacecraft, to see for himself how “messy” the procedures were. Shea had considered it and decided that since there was no way to provide a communications line for him inside the command module, it wasn’t a “workable idea.”
What if Shea had instead thought, “That’s a darn good idea.”
The Right Stuff
Aleck Bond earned a degree in Aeronautical Engineering from Georgia Tech in 1943. Bond went to work for the National Advisory Committee for Aeronautics in 1948. A decade later, when that “Committee” became NASA, Bond was the manager responsible for the design of the ablative shield for the Project Mercury space capsule.
Meaning Aleck Bond was one of the best and brightest responsible for the greatest technical achievement ever.
Mercury capsules were bell-shaped; the curved bottom of the bell was what took on 2,000 degrees worth of heat produced by friction as the capsule re-entered Earth’s atmosphere. The ablative shield on the bottom and the insulation layer above it were designed to take the heat while keeping the cabin cool, relatively speaking.
Talk about Mission Critical!
In Bond’s words, “If that shield didn’t work, the astronaut would be boiled like an egg.” The insulation layer had to be thick enough to work– but work within the limits of the payload. Picture solving that problem in 1958 – using a slide rule and a chalk board.
Responsible for getting it right, Bond convened two separate teams to perform the calculation. He sent one to a conference room at one end of the building; the second to other end of the building. Contact between teams was forbidden.
Eventually both teams came back with their answers: they were identical.
What did Bond do with their answer? He doubled their number, just to be on the safe side.
Prudently following the discipline of science – while adding in a healthy dose of wisdom for good measure: Aleck Bond was the perfect example of the right stuff needed to successfully manage risk.
Paul Balmert
July 2019