In the final lunchtime seminar of the spring term, Stanford University’s Professor David Engstrom shared some key findings from a recent study which had explored AI usage across 120 US federal government organisations. The study, which had taken his 15-strong team a year to complete, was undertaken by a multidisciplinary team of law, computer science and business school researchers.

Around 45% of the federal organisations examined as part of the study were either experimenting with, or has already deployed, AI / machine learning technology, Prof Engstrom revealed. Usage was spread across all forms of government, he added, but tended to be clustered into activities which focused on regulatory monitoring / analysis, procurement, engagement with the public, and the direct delivery of services. For example, in relation to regulatory monitoring, the US Federal Drug Administration was using AI to help with the analysis of drug side effects reporting, Prof Engstrom explained. In relation to delivery of services, the US Postal Services was experimenting with self-driving vehicles, he added.

According to the presented study, AI-assisted tools that were more central to the delivery of justice included those being used by the Patent and Trademark Office. For example, tools were being used to help patent reviewers classify patents from the 10,000 options available, Prof Engstrom said. And, in relation to trademarks, “image similarity” tools were being used by reviewers, in order to compare new applications with those already registered. Meanwhile, in the US Social Security Administration, AI-assisted tools were being used to both allocate and triage cases, and to review draft decisions for 30 commonly-made errors.

In relation to “buy versus build”, more than half of AI-assisted were developed in-house, the research had discovered. One explanation for this developmental preference was that many tools required the “embedded expertise” of agency employees to develop them. Later, Prof Engstrom noted that AI-assisted tools already being used by US federal organisations were typically not very sophisticated—but often did not need to be, because they were only intended to overcome narrow administrative or bureaucratic problems. These tools were often built by individuals who had “learned enough machine learning over a weekend”, he said.

In terms of the challenges associated with AI deployments by US federal bodies, Prof Engstrom cited “gaming” and “adversarial learning” as emerging problems—essentially, these tactics amounted to deliberate attempts to confuse algorism. He also suggested that conflicts of interest might become problem, should external software developers seek to “commercialise their insider understanding” of government agencies’ operations, after helping to co-create tools with them. Prof Engstrom also highlighted the likelihood that regulatory transgressors would change their behaviour in response to AI-assisted tools being developed by regulatory agencies. This outcome would require agencies to develop, and review, their AI-assisted tools iteratively, he suggested.  

Why it was so important for the algorithmic decision-making tools used by US federal agencies to be as fair and transparent as possible? According to Prof Engstrom, unlike in jurisdictions such as the UK, decisions taken by US federal agencies are not typically amenable to judicial review. A change in the law to introduce such oversight would require a “pretty seismic change in the way American administrative justice works”, he observed.

In the absence of judicial review reform, or legislative action by the US Congress, Prof Engstrom concluded his talk by outlining various options by which AI-assisted federal agency tools might be subject to “algorithmic accountability” – essentially, scrutiny and / or testing. One possibility might be to define decision-making algorithms as being an “administrative rule”—thereby subjecting them to a “notice and comment” process before they could be adopted. In Prof Engstrom’s view, private enforcement via litigation, impact assessments, and “perspective benchmarking” were also options worth considering. Under a positive benchmarking regime, he explained, a randomly selected group of decisions, reached by AI-assisted tools, would be directly compared with those made by humans. This comparative process would allow an assessment to be made about whether any problematic decisions were being reached by either method, he said. This proposition was, he suggested, “a promising way to think about this issue”.

The Stanford University research, Government by Algorithm: Artificial Intelligence in Federal Administrative Agencies (February 2020), can be downloaded here.