More than half a century after humans landed on the moon, spaceflight is suddenly getting exciting again. Deep-pocketed civilian astronauts, for instance, are now able to buy seats on orbital missions flown on reusable rockets, while commercial space stations and lunar bases are in the planning stages. A host of companies are launching megaconstellations, swarms of satellites designed to bring broadband Internet access to remote, disconnected corners of the Earth.
Despite this resurgent activity, the fact remains that once a spacecraft has left Earth and is travelling at hypersonic velocity in the vacuum of space, it is virtually impossible to fix designed-in faults. That provoked a pair of spaceflight engineers to issue a rare warning, one in which they call on the space engineering community to do more to quash a particular type of mission-destroying error that they say has "plagued" spaceflight since its inception.
Called sign errors, these mistakes are multifaceted. At its most basic, a sign error involves applying a critical design parameter the wrong way—backwards or upside-down, say—in software or hardware. These reversals involve wrongly using negative instead of positive numbers (or vice versa) in guidance, navigation, and control data, for instance, or simply switching current in the wrong direction through a circuit.
It doesn't end there; sign errors also include physical reversals, such as fitting acceleration sensors the wrong way round on circuit boards, getting the polarity of components (such as electrolytic capacitors) wrong, or perhaps inverting the orientation of the electromagnets used to position a spacecraft in three dimensions.
"Space exploration is the product of thousands of engineers, and about two-thirds of spacecraft failures are caused by tiny engineering mistakes," says Paul Cheng, a senior project leader with nonprofit The Aerospace Corporation in Los Angeles, CA, who sounded the warning on sign errors alongside his colleague Peter Carian in a paper published in the September edition of the Journal of Space Safety Engineering.
"Sign errors turn out to be the most common causes of spacecraft failures, ahead of decimal-point errors, or mix-ups between English and metric units. Sign errors are way on top," Cheng says.
It is high time sign errors were addressed, say Cheng and Carian, who take the unusual step in their paper of "beseeching" space engineers to address sign errors. It is not hard to see why: spaceflight insurers have calculated that 5.3% of satellites launched into orbit are lost in their first year, with 42% of those failing in their first two months.
With some 15,000 satellites expected to be launched in this decade alone, largely as parts of megaconstellations, that failure rate could lead to atrocious losses, and result in large volumes of uncontrollable space junk being left in orbit to boot.
After reading a litany of spacecraft failure reports, Cheng and Carian decided to spell out the ways in which various types of sign error have doomed missions, in the hope that they can help engineers avoid them.
The duo's first example is the U.S. National Aeronautics and Space Administration (NASA) Genesis probe, launched in 2001, which visited a spot in deep space some 1.5 million kilometers (about 932,000 miles) from Earth to capture particles from the solar wind inside a delicate matrix of aluminum wafers. During Genesis' reentry at the end of its mission in September 2004, a pencil-eraser-sized deceleration sensor called a g-switch was meant to sense the slowdown as the craft encountered Earth's dense atmosphere and activate deployment of a braking parachute and parafoil ahead of capture in midair by a helicopter.
Unfortunately, the sensor had been mounted upside-down, as nobody knew that a marker on one end of it indicated the only direction in which it could sense deceleration.
The outcome was that Genesis pounded into the Utah desert unbraked, smashing several precious particle collection wafers. This was puzzling at the time as the same g-switch had worked perfectly on another NASA mission, Stardust, which had launched two years before Genesis to sample the dust in a comet's tail. By pure chance, the g-switch had been installed correctly on Stardust, and had passed spin-tests on the ground. As a result, the Genesis team did not feel the need to spin-test their differently-oriented sensor circuit layout, not knowing "the new layout had voided the heritage," as the researchers say.
To avoid such "polarity surprises," Carian and Cheng recommend the orientation of sign-sensitive components should be tracked all the way through spacecraft development. Engineers, they say, should leave cautionary notes regarding sign-sensitive parts on CAD drawings, and on the box the component is kept in, to warn there is a risk "of inserting things backwards."
Minus Sign Mayhem
That sign errors only need such basic measures to quash them also was evident in the orbital breakup of the Japanese X-ray space telescope Hitomi just a month after it was launched in February 2016. The problem: an unchecked over-the-air software update.
Launched by the Japan Aerospace Exploration Agency (JAXA) to probe black hole physics, Hitomi needed a software update in orbit to help its thrusters stabilize the telescope when its orbit took it over a magnetic disturbance, called the South Atlantic Anomaly, that upset its "star tracker" navigation system. That update included a modified thruster firing algorithm with a new data look-up table. That table, however, was modified manually, and the programmer responsible was not told to convert six negative numbers into positive values. The result was that out-of-control thruster firings sent the telescope into a relentless spin, centripetal force making it shed its large solar panels before breaking up into 11 pieces, according to the U.S. Joint Space Operations Center (JSpOC), which monitors space debris.
The telescope's stability issues were "wholly survivable," say Carian and Cheng, until those thruster sign errors arrived unchecked. They concluded, "The satellite was doomed not due to a lack of exotic control tools, but because they paid no attention to elementary [software] safety practices."
All torque, no action
A similar lack of communications between software and hardware engineers led to the loss of NASA and Boston University's TERRIERS satellite in 1999; an electromagnet coil used to provide torque to point the satellite's solar panels at the Sun had to be reoriented so it would fit in a housing, "but nobody told the software developers of this change," Carian and Cheng noted. "TERRIERS became uncontrollable, and exhausted its battery before a rescue could be mounted."
Battery exhaustion also struck SKIPPER, a U.S.-Russian joint venture, in 1995. Ground tests with a solar array simulator had verified that the spacecraft itself was drawing power and charging its battery, but a sign error was hiding in the weeds: the wiring on the satellite's actual solar array was connected backwards, so in space, instead of charging the battery, it depleted it. The result: immediate loss of mission.
It is not only satellites and space probes that have suffered sign errors; rockets are vulnerable, too, Aerospace Corp. notes. In July 2013, a Russian Proton-M rocket lifted off and immediately flew back down to the ground because its angular velocity sensors were installed upside down, despite being marked 'this way up'. In November 2020, a European Vega rocket was lost (along with the French and Spanish satellites it was carrying) because the wiring controlling the rocket engine's steering nozzle was wired backwards, inverting every directional command.
These, and many other, warning signs have impressed specialists in the field of dependable systems.
Peter Ladkin, an engineer specializing in system safety and root cause failure analysis at Bielefeld University in Germany, describes Carian and Chen's paper as a "fascinating piece of work" that highlights the risks inherent in the fact that "many devices are visually symmetrical, but functionally asymmetrical."
However, because most of the reported cosmic blunders were related to incorrect design and/or installation, Ladkin prefers a different label for the non-software-related sign errors. "I would call the majority of them 'functional misalignment errors' rather than simply sign errors," he says.
Lorenzo Strigini, director of the Center for Software Reliability at City University in London, thinks Cheng and Carian have "performed an essential service" by calling attention to the sign error problem. However, he has one reservation.
"The idea of having a special category of errors in which both reversing a capacitor, and typing a wrong sign in a software statement, are similar, is a brilliant idea up to a point, but it is not to be taken too far. It is a neat abstraction that helps them make the important point that the same abstract principles apply to defending against both."
Paul Marks is a technology journalist, writer, and editor based in London, U.K.