Enterprise Metrics and Optimization

Collecting and publishing metrics are important for any organization, and there are many framework options such as Ganglia, Prometheus, and the ELK (Elastic, Logstash, Kibana) stack for such tasks. As important as metrics are for understanding current state, collection and publishing are just the first step, the next being metric interpretation to understand whether a metric is positive or negative, and then finally the process to determine what metrics to optimize.

Let’s Buy An Example Zoo

A useful concrete example to illustrate these points is a zoo, something most readers should be acquainted with. Assuming we wanted this to be “The Best” zoo, how exactly would that be measured? Even aiming for “The Biggest” honors is complex. Trying to increase visitor and revenue metrics are a good place to start, of course, but that assumes there is a reason for the increased draw—those type of increases generally don’t happen from nothing, and marketing isn’t free. In terms of other “Biggest” metrics, the zoo could expand physically, but does “square feet of capacity” include the parking lot? Each metric is valid as far as it goes, but each metric also begs the addition of other metrics for further context.

It is also possible to over-optimize when trying to improve a metric, such as increasing “number of animals.” Options could include adding a few lions, or some pandas, or 500,000 ants. While counting each ant as an animal is clearly a cheap numeric trick and no doubt misses the real intention of the metric, the software industry has been collectively counting ants for longer than many may realize—sometimes accidentally, and sometimes on purpose.

Ant Counting In Software Engineering

Counting Software Ants

As described in Robert X. Cringley’s early 1990’s book Accidental Empires and his documentary Triumph Of The Nerds, IBM might not have created the “KLOC” metric (thousand lines of code), but it turned it into a religion for a time. Software projects were estimated, assessed, and often paid in KLOC. As code is the output of the software engineering process, it isn’t unreasonable to want to count it, but attaching a meaning to lines of code as so many have noted before is where things get tricky. If KLOC gets bigger, that should indicate progress, but if KLOC gets lower—did things get better or worse? Hopefully better, if someone cleans up the codebase and removes and refactors redundancies. Complicating matters further, code counts can differ for the same task between programming languages, and it’s hardly uncommon for modern systems to be written in multiple languages. On top of all that, developers can add their own individual flourishes to this problem by writing code in different styles—such as going extremely wide—where the metric “characters of code” arguably is more informative than “lines of code.”

Function Points, created at IBM in 1979, have been around for decades as a way to attempt to measure system complexity but not count lines of code, for all of the above reasons. But even the original research paper admitted that said function points were highly correlated to the very thing they were trying to not use. I had the same experience the one time I was exposed to function points on a project many years ago. We were documenting screens and inputs and outputs, but then I was also mentally recalling the amount of code behind each button for relative point complexity.

A friend once told me there was an initiative at his company to generate “re-usable code,” and those who generated such code would be given both praise and a bonus. Some software engineers suddenly started generating a flurry of 1-line functions to boost their re-usable code metrics, as all those tiny functions could theoretically be re-used by somebody. The intention for a smaller more efficient codebase wasn’t wrong, but the emphasis on only praising the re-usable code writers instead of the re-usable code users as well is what doomed this effort to go sideways.

Counting Data Ants

In the past decade, Big Data entered the public lexicon, and it became popular to talk about data size. Bigger was better, at least in terms of impressing people at conferences. But I once had a conversation where someone said “we have 100 TB of data” as if it was a bad thing, and just as importantly as if that was the end of the conversation. Having a footprint of 100 TB was a factoid, a data point, but this is neither good nor bad without further context. It’s not known from that single number how many environments were included (e.g., production, QA, etc.), what kind of data this was (structured vs. unstructured), whether this only included current version or historical copies, on what storage media the data was persisted, and at what price-points, for a starter list of questions. A metric is a metric as far as it goes, but interpretation requires a lot of follow-up. It’s an art. It’s worth wondering how the same person would feel if the very next day they suddenly only had 50 TB of data, perhaps in a keystroke. Did things get better, or worse?

Although it doesn’t happen as much now as it used to, I’ve also had more than few conversations where someone tried to define Big Data narrowly, with the punchline being “oh, that’s not Big Data”—a real nerd put-down. It’s true that some data use-cases are larger than others, but I always thought that if the storage, processing, or data-serving needs exceeded a single computer, then as far as that use-case was concerned it was a Big Data problem, irrespective of how other major tech firms might classify it.

Case in point, I was fortunate to have one of the system engineers on IBM’s 2011 Jeopardy project perform a fascinating talk in 2015 for the big data meetup I run. The primary input corpus of crawled documents (especially Wikipedia) was about 50 GB. According to some, that’s not big data. After indexing and optimizing that data for searches, plus adding in a few other supplemental sources, the data footprint expanded to 500 GB. According to some, that might not be big data. Now consider the fact that the Jeopardy playing system needed consistent very sub-second response time, all that indexed and optimized data wound up being sharded over 90 servers. I would call that a Big Data use-case, especially for its time.

Counting Monetary Ants

Speaking of narrow definitions, “money is the root of all evil” is a frequent misquote from the New Testament. Money can be useful for obtaining food, shelter, and funding other necessities, such as payroll and cloud compute charges. Per above, if there is no money, there is no zoo. The actual quote from 1 Timothy 6:10 is “for the love of money is a root of all kinds of evil“—money itself not being “the problem” per se, but rather an emphasis on money over everything else, including safety of other employees or the public.

Boeing’s 737 MAX crash-saga was, at an implementation level, a technology and software problem. The MCAS (Maneuvering Characteristics Augmentation System) that was introduced into the 737 MAX advertised to be backward-compatible with the 737NG and not require any additional pilot training, wound up being the source of problems instead of the solution, with catastrophic results. The January 2021 Communications article “Boeing’s 737 MAX: A Failure of Management, Not Just Technology“ is a good post on this topic.

Like most complex problems, the origins went back many years, specifically to the 1997 Boeing and McDonnell Douglas merger. The January 2020 Quartz article “The 1997 Merger That Paved The Way For The Boeing 737 Max Crisis” cut to the heart of the matter:

“In a clash of corporate cultures, where Boeing’s engineers and McDonnell Douglas’s bean-counters went head-to-head, the smaller company won out. The result was a move away from expensive, groundbreaking engineering and toward what some called a more cutthroat culture, devoted to keeping costs down and favoring upgrading older models at the expense of wholesale innovation. Only now, with the 737 indefinitely grounded, are we beginning to see the scale of its effects.”

It’s not like Boeing didn’t do upgrades. Even the fabled 747 went through -200, -300 and -400 series “releases,” for lack of a better word, after the initial 747-100. And that was all before the McDonnell merger. I would wager, however, that Boeing’s customers at that point knew what they were getting with each upgrade.

It had to be less expensive in terms of money and human lives just to do the right thing in the first place and provide 737 MAX pilot training up front—I’m sure somebody by now has done those calculations. Why wasn’t that path obvious? Because more than a few people got caught up optimizing the wrong things. A lot of priority was put on counting money, but not nearly enough on considering crew and passenger safety.

Finally

Metrics are critical; without them you are flying blind. But be careful what you try to optimize because you might just get it.

References

Code Ants

Robert X. Cringley – Accidental Empires https://www.cringely.com/tag/accidental-empires/

On comparing counting lines of code between languages: http://www.projectcodemeter.com/cost_estimation/help/GL_sloc.htm

Function Points: https://en.wikipedia.org/wiki/Function_point

Monetary Ants

The 1997 Merger That Paved The Way For The Boeing 737 Max Crisis, Quartz, January 2020, https://qz.com/1776080/how-the-mcdonnell-douglas-boeing-merger-led-to-the-737-max-crisis/?utm_source=pocket-newtab

737 MAX Investigation, House Transportation Committee, September 2020, https://transportation.house.gov/committee-activity/boeing-737-max-investigation

737 MAX: A Failure of Management, Not Just Technology, Communications, January 2021, https://cacm.acm.org/magazines/2021/1/249448-boeings-737-max/fulltext

Downfall: The Case Against Boeing, Netflix, 2022, https://www.netflix.com/title/81272421

Doug Meil is a software architect in healthcare data management and analytics. He also founded the Cleveland Big Data Meetup in 2010. More of his BLOG@CACM posts can be found at https://www.linkedin.com/pulse/publications-doug-meil