The BLOG@CACM post "Data Governance and the Psychology of Tension Management" described the inherent conflict in data governance in that data needs to be kept safe, but data also needs to be used to provide value, and that data governance also needs to be infused into solutions to be most effective. Too often, the data governance saga ends at the point of obtaining access to data, but the story isn't over — especially for analytics in regulated environments where the external publishing of numbers has consequences.
The Current State Of Analytic Governance
Let's assume an intrepid analyst requests and is granted access to a dataset. Happy days, right? Perhaps. All organizations need metrics for internal management, and many organizations like to publish numbers externally even just for promotional purposes, whether they be market metrics such as number of customers or throughput metrics such as number of orders. Where those numbers come from raise important questions.
The first questions are technical. Web-based transactional systems have had time to mature in the last few decades with source code management, automated unit testing, build management, and continuous integration patterns being well established patterns across most industries. To contrast, if one sees "X million customers" or "Y million orders" on a website — where exactly did those numbers come from? They may have been obtained after scrolling through potentially dozens or hundreds of dashboards. Or perhaps somebody ran a custom script, where the queries were hopefully similar, if not exact, to those backing the dashboards. Either way, that any of this analytic code has been checked in, reviewed with other analysts, has unit tests, and has an automated build is an admirable, but likely hopelessly optimistic, assumption.
The next set of concerns are regulatory. Public companies in the United States are legally obligated to publish certain numbers, and there are potential penalties (including criminal charges) if those numbers aren't accurate. Sarbanes-Oxley (SOX) requires, among other things: "enhanced reporting requirements for financial transactions, including off-balance-sheet transactions, pro-forma figures and stock transactions of corporate officers. It requires internal controls for assuring the accuracy of financial reports and disclosures, and mandates both audits and reports on those controls." If a public company posts a number, irrespective of any other industry-specific regulations, the odds are high that SOX might have something to say about it. SOX became law in 2002 in the aftermath of Enron and WorldCom accounting scandals. Even with the best of intentions, humble reports and dashboards could start becoming evidence if not implemented and managed properly.
Challenges In Analytic Governance
The financial sector in the United States has no small amount of regulatory oversight. Agencies include, but are not limited to, the Federal Financial Institutions Examinations Council (FFIEC), the Board of Governors of the Federal Reserve System (FRB), the Federal Deposit Insurance Corporation (FDIC), the National Credit Union Administration (NCUA), the Office of the Comptroller of the Currency (OCC), and the Consumer Financial Protection Bureau (CFPB). The FFIEC sets many standards, including those on Data Management and Data Governance. Banks also must comply with Payment Card Industry (PCI) standards to help ensure security of credit card transactions in the payments industry. There is also the Security and Exchange Commission (SEC). And everybody wants reports.
Perusing just one of the FFIEC filings (FFIEC 009 below) is a small demonstration of the reporting effort required as all of those little white boxes need to get filled in year over year (and sometimes month over month).
Note that "Schedule C, Part II: Claims on a Guarantor Basis and Memorandum Items" is distinct from "Schedule C, Part I: Claims on an Immediate-Counterparty Basis," "Schedule L: Foreign-Office Liabilities," "Schedule O: Off-Balance-Sheet Items," "Schedule D: Claims from Positions in Derivative Contracts," etc. It's a 34-page filing, after all.
These analytic and reporting examples are just from financial industry regulatory requirements. Analytics for improving internal operations and/or cross-selling of financial products would be a completely different category but still in the Data and Analytic Governance umbrella.
The Security and Privacy Rules rightfully loom large such that one can practically identify developers who work with healthcare data because they will insert "HIPAA" randomly in sentences almost as a nervous tic. This section, however, will go beyond topics like handling PHI and encryption at-rest/in-flight to the previously mentioned "customer counting" theme: what is a patient? It depends. Multiple analysts can look at the same database and answer that question differently, although with each answer being "technically accurate" but distinct.
Obtaining the number of patients via a count of demographic records is arguably the simplest analytic, but even that has can have challenges as sometimes patients are registered under "stub" or partial records, which then could get merged into other patient records at a later date. Thus the count of all demographic records could be different than only the active demographic records, even though both numbers are valid in their own context.
Counting patients with specific conditions is a long-standing complexity in healthcare analytics that shows up in multiple contexts. A count of diabetic patients, for example ,seems like something that should be fairly obvious, but could require evaluating not just original diagnoses (e.g., ICD codes), but then diagnoses mapped to other ontologies (e.g., SNOMED), medications indicative of diabetes (which could require either a long list included medications, potentially a cross-walk to other pharmacological classifications), and observations or lab-tests indicative of diabetes. There are the technical aspects of coming up with a count, but then what can be just as complex and important is ensuring one can explain the criteria.
The reason why these examples are relevant beyond clinical purposes is that these count-of-patient metrics could be published on a website for marketing purposes or stated in a yearly report, just like the hypothetical business at the top of this post. Additionally, for-profit public healthcare corporations would be subject to SOX and other regulations as in other industrial sectors.
Self-Service Business Intelligence tools have been successful, leading to hundreds and sometimes thousands of dashboards and reports in some organizations. But that proliferation can create problems of its own as governance of the output can become more difficult with respect to determining which are the right analytics for a given situation.
One way to address the governance of analytics is to categorize by risk, for example:
- Regulatory Reports – Most Critical. This information is submitted to regulatory organizations.
- External Reports – Critical. These are reports that are published to investors or Boards of Directors.
- Strategic Reports – Essential. Reports that are delivered to senior business leaders for decision-making.
- Operational and Ad Hoc – Reporting that supports daily operations and workflow.
Calling back to the theme of Tension Management, once the analytics are risk-rated, varying levels of governance and controls can be applied appropriately, with the most attention and control at the top tiers, while still allowing flexibility and exploration on the lower end.
Lastly, source code management and testing behind higher-risk analytics need to be treated with the same rigor as other software development efforts, not just for the internal efficiency of producing required reports, but also for traceability. Numbers don't mean anything if people don't either understand, or trust, the computation behind them.
Doug Meil is a software architect in healthcare data management and analytics. He also founded the Cleveland Big Data Meetup in 2010. More of his BLOG@CACM posts can be found at https://www.linkedin.com/pulse/publications-doug-meil. Michael Onders is Executive Vice President and Chief Data Officer at KeyBank.