Sometimes the intricacies of tax laws are mind-boggling, even to lawyers. Sarah Lawsky, a law professor at Northwestern University School of Law and Jonathan Protzenko, a principal researcher at Microsoft Research, were working to translate Section 121 of the U.S. Tax Code, which stipulates how much a taxpayer can deduct from their income taxes from the profit of the sale of a home, into programmable code.
They found themselves stumped, because while the law stipulates a profit on the first $250,000 of a home sale is not to be taxed, "there's like nine layers of exceptions," including whether a person served in the military or is married, or if a spouse is deceased, Protzenko says.
U.S. law is "insanely complicated, so when you have that amount of insanity there's no way a human can confidently claim, `Give me your situation and I'll give you the right answer'," he says. "You need code to capture and precisely express what's supposed to happen, because English is too fuzzy and irregular."
Protzenko and Lawsky spent hours debating a fine point, "and she said, `Gosh, I thought I knew this text well'," he recalls. "She teaches it to her students every year, but transcribing law into code requires you to think about the most minute details, and there was a fair amount of head scratching to make sure we were 100% correct about what the law means."
Thankfully they were, and the code was embedded into Catala, a programming language developed by Protzenko's graduate student Denis Merigoux, who is working at the National Institute for Research in Digital Science and Technology (INRIA) in Paris, France.
It is not often lawyers and programmers find themselves working together, but Catala was designed to capture and execute legal algorithms and to be understood by lawyers and programmers alike in a language "that lets you follow the very specific legal train of thought," Protzenko says.
In highly regulated industries, critical laws are translated precisely into code that reflects their intent. This is especially true when it comes to tax software and software that verifies Health Insurance Portability and Accountability Act (HIPAA) compliance. Yet, tax software "does not formalize statutes in a meaningful way,'' according to Lawsky.
Tax forms created by the government essentially take tax laws and put them into algorithms—and they are not a direct translation of the law, she says.
For example, in a prepublication article for the Ohio State Technology Law Journal, Lawsky wrote that software programs such as TurboTax encode tax forms, which are not law. They are prepared by the government, which collects information and turn portions of the law into an algorithm for taxpayers to apply.
"The difficult part of the coding, and the judgment calls, are almost entirely performed by the government, not by those who code tax preparation software," Lawsky wrote.
"And because forms turn law into algorithms, the forms themselves—not the instructions … may contain judgments about the law [and] sometimes law that is unclear," she says. "The forms themselves abstract away from the law."
You would think something as numerical as income tax law would be similar to mathematical logic, but it is not, Protzenko says, because it is not written with the precision and clarity that would "make it amenable to a very mathematical reading of it."
For example, that law does not mention a number may need to be rounded into whole cents. "The law won't tell you what you're supposed to do with rounding numbers and that can lead to ambiguity and a lack of specification of what's supposed to happen," he says.
Healthcare law is also very complex. Faisal Khan, senior legal counsel at healthcare law firm Nixon Gwilt Law in Vienna, VA, says, "Software for HIPAA compliance must incorporate algorithms that target and hit on all the top-level statutory requirements and implementing regulations.'
To make that happen, Khan says, "There must be a team of compliance-related input as many of the regulations essentially function as guidelines for companies to adhere to."
That means a process or security check that may be compliant for a small company may not automatically be compliant for a large company or health system, Khan says. Moreover, software data should be verified by compliance specialists because enforcement professionals at the U.S. Department of Health and Human Services are not only going to review documents, but also will scrutinize how key individuals and stakeholders are following HIPAA compliance processes and tweaking those processes as necessary as things change, according to Khan.
Thus, while software is a key solution to reducing costs and standardizing practices, "There needs to be a human element to support implementing an algorithm-based solution on the ground," Khan says. "What works from a logic perspective may not be the best solution based upon company operations and existing processes."
The Catala team incorporated the human element. "Based on our experience with how French tax code and many other bits of computational law around the world works, the common theme is that there are two worlds that don't talk to each other," Protzenko says.
There are legal experts who understand legal documents and can offer expert opinions, and then there are the programmers tasked with writing code that faithfully implements the law, he says. Previously, this was "done in a vacuum," each working on his or her own, which Protzenko says "resulted in many disasters," such as people not getting the right amount of money in their paychecks, or people receiving incorrect tax return checks.
There have been other attempts to create synergies between legal experts and programmers, but Protzenko says what distinguishes his team's work is that they are using tools and ideas that stem from a long history of research into programming languages.
"The two worlds have their own interpretation, and code ends up being wrong and then there's human consequences on the line," he says. "We said, `We have to do something different and design a new programming language that allows programmers and lawyers to talk to each other'."
Tokyo-based NTT Data, a global IT services provider processing over 100 million healthcare claims for its clients annually, has relied on the fact that both its third-party vendors and its custom-designed software use rules-based programming to ensure new regulations are being translated correctly, says Kumar Srinivas, CTO for NTT Data's health insurance business. The company also relies on key performance indicators that measure the success and outcomes of codification of those regulations. Translating complex healthcare laws into programmable code is extremely resource-intensive, Srinivas says.
"For the most part, health plans try to achieve [an] 80% or higher auto adjudication rate; for example, software programs that handle the codification without manual intervention," he says. "However, the remaining 20% … needs to be processed manually. In addition, there is a need to validate/audit—or what we call QA (quality assurance)—the results to ensure that the processing has occurred in the right way."
Artificial intelligence (AI) is disrupting the process of translating laws into code, he says. "The codification of laws to software programs has moved into a very disruptive phase with the advent of AI and machine intelligence," Srinivas says. "What was once a logical progression using a rule-based system is now more in the realm of supervised or unsupervised models, neural networks, and also Bayesian models," networks that can explain very complicated structures with the ability to use prior real-world knowledge, he says.
At Avalara, a Seattle, WA-based tax compliance software provider, a team of more than 100 content researchers comprised of lawyers, certified public accountants (CPAs), and tax specialists maintain and research tax content. They use a variety of software and manual processes to monitor legislative changes and court cases that could clarify ambiguity, tax rate schedule changes, and cases in which cities have expanded geospatial boundaries, says Kevin Hess, director of product management.
In some instances, Avalara's software developers work with lawyers and CPAs to make changes. For example, when a content researcher finds a tax rate, boundary, or tax rule change, and it fits into Avalara's software engine logic, the change will be entered into the content management system, reviewed by the content team, and tested in the end product to make sure it is rendering properly, Hess says.
The tax engine already supports a core set of logic, so it is able to do many tax calculations, he says.
Some unique use cases kick off development for inputting new logic, he says. For example, alcohol excise taxes are calculated in Ohio on a per-ounce basis, which Hess says is not uncommon. Because the state defines the tax rate per ounce or any fraction of an ounce, Avalara needed to update the system to round 11.2 ounces up to 12, he says.
"Because tax rates and rules are defined as part of an abstract domain-specific language, our engine can apply them even when something doesn't fit into the existing logic with requiring a change within the engine itself," notes Marcus Larner, vice president of engineering at Avalara. "Instead, the change would happen in the content that we're pulling into our logic."
The Catala system picks up exceptions in the U.S. tax code and has an override mechanism to follow non-monotonic legal reasoning, which is a term for something that starts with a general statement, only to find that does not apply further down in the code, Protzenko says. Internally, this is simplified by the system's compiler to a core language based on default calculus, he says. It is intentionally designed not to resemble any existing programming language.
"If we were doing that with Python, lawyers would not understand what Python means," Protzenko says. "We've crafted code with reasoning that lawyers understand. That's the exception mechanism. By having that feature in the language, our code naturally follows the structure of the law and makes it easy for lawyers to follow along."
Talks continue with French tax officials, and the Catala team is hopeful the language will be used within a few years to run some chunks of French income tax computation, says Merigoux.
Merigoux, D., Chataing, N., and Protzenko, J.
Catala: A Programming Language for the Law, 2021, https://arxiv.org/abs/2103.03198
Form as Formalization (April 27, 2020). Northwestern Public Law Research Paper No. 20-09, Ohio State Technology Law Journal, forthcoming, Available at SSRN, https://ssrn.com/abstract=3587576
Programming Language Converts Laws Into `Provably Correct' Computer Code, Discover Magazine, 23 March 2021, https://bit.ly/3yeVh1s
Translating law into code-why computer scientists and lawyers must join forces, Digital Society Blog, 2019, Institut Fur Internet Und Gesellschaft, https://bit.ly/3B8u02S
©2022 ACM 0001-0782/22/1
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from [email protected] or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2022 ACM, Inc.