"Transparency != Accountability"

danah boyd
EU Parliament: November 7, 2016

This talk was written for an EU Parliament Roundtable. It is a crib; the actual talk came out slightly differently. A video of the actual talk is available here.

Citation: boyd, danah. 2016. "Transparency != Accountability" EU Parliament Event 07/11 Algorithmic Accountability and Transparency. Brussels, November 7.

In the next ten years we will see data-driven technologies reconfigure systems in many different sectors, from autonomous vehicles to personalized learning, predictive policing to precision medicine. While the changes that we will see will create new opportunities, they will also create new challenges — and new worries — and it behooves us to start grappling with these issues now so that we can build healthy sociotechnical systems.

I believe that algorithmic transparency creates false hope. Not only is it technically untenable, but it obfuscates the real politics that are at stake.

Algorithms are nothing more than a set of instructions for a computer to follow. The more complex the technical system, the more difficult it is to discern why algorithms interact with each the way that they do. Putting complex computer code into the public domain for everyone to inspect does very little to achieve accountability. Consider, for example, the “Heartbleed” vulnerability that was introduced into OpenSSL code in 2011 but wasn’t identified until 2014. Hundreds of thousands of web servers relied on this code for security. Thousands of top notch computer scientists work with that code on a regular basis. And none of them saw the problem. Everyone agreed about what the “right” outcome should be, and tons of businesses were incentivized to make sure there were no problems. And still, it took two and a half years for an algorithmic vulnerability to be found in plain site, with the entire source code publicly available.

Transparency does not inherently enable accountability, even when the stars are aligned. To complicate matters, algorithmic transparency gets you nowhere without the data. Take, for example, Facebook’s News Feed. Such systems are designed to adapt to any type of content and evolve based on user feedback (e.g., clicks, likes, etc.). When you hear that something is “personalized,” this means that the data that you put into the system are compared to data others put into the system such that the results you get are statistically relative to the results others get. People mistakenly assume that personalization means that decisions are made based on your data alone. To the contrary, the whole point is to place your data in relation to others’. Even if you required Facebook to turn over their News Feed algorithm, you’d know nothing without the data. And asking for that data would be a violation of user privacy.

The goal shouldn’t be transparency for transparency’s sake. We want to get to accountability. Most folks think that you need algorithmic transparency to achieve accountability. I’m not so sure that’s true. But I do know that we cannot get closer to accountability if we don’t know what values we’re aiming for. We think that if the process is transparent, we can see how unfair decisions were made, but we don’t even know how to define terms like fairness. Is it more fair to give everyone equal opportunity or to combat inequity? Is it better for everyone to have access to the content shared by their friends or should hate speech be censored? Who gets to decide? We have a lot of hard work to define our terms that is, in many ways, separate from the hard work of understanding the algorithmic processes that are implementing the values “in” those terms. If we can define our terms, a lot more can be done.

Personally I’m excited by the technical work that is happening in an area known as “fairness, accountability, and transparency in machine learning” (FATML). An example remedy in this space was proposed by a group of computer scientists who were bothered by how hiring algorithms learned the biases of the training data. They renormalized the training data so that protected categories like race and gender couldn’t be discerned through proxies. To do so, they relied heavily on legal frames in the United States that define equal opportunity in employment, making it very clear that when the terms of fairness are clearly defined, they can be computationally protected. This kind of remedy shows the elegant marriage of technology and policy to achieve agreed upon ends.

No one, least of all a typical programmer, believes that computer scientists should be making the final decision about how to trade off different societal values. But they’re the ones who are programming those values into a system — and if they don’t have clear direction, they’re going to build something that affects peoples’ live in unexpected ways. Take, for example, scheduling software. Programmers have been told to maximize retailer efficiency by spreading labor out as much as possible. But that means that workers’ schedules are all over the place, that children suffer, that workers do double shifts without sleep, etc. The problem isn’t the algorithm; it’s how it’s deployed, what maximization goals it uses, and who has the power to adjust the system. If we’re going to deploy these systems, we need to articulate clearly what values we believe are important and then be held accountable for building systems to those standards.

The increasingly widespread use of algorithms makes one thing crystal clear: our social doctrine is not well-articulated. We believe in fairness, but we can’t define it. We believe in equity, but not if certain people suffer. We believe in justice, but we accept processes that suggest otherwise. We believe in democracy, but our implementation of the systems that could support it is flawed. Computer scientists depend on clarity when they design and deploy algorithms. If we aren’t clear in what we want, accountability doesn’t stand a chance.