In June of 2018, the Algorithmic Fairness and Opacity Working Group (AFOG) held a summer workshop with the theme “Algorithms are Opaque and Unfair: Now What?.” The event was organized by Berkeley I School Professors (and AFOG co-directors) Jenna Burrell and Deirdre Mulligan and postdoc Daniel Kluttz, and Allison Woodruff and Jen Gennai from Google. Our working group is generously sponsored by Google Trust and Safety and hosted at the UC Berkeley School of Information.
Inspired by questions that came up at our biweekly working group meetings during the 2017-2018 academic year, we organized four panels for the workshop. The panel topics raised issues that we felt required deeper consideration and debate. To make progress we brought together a diverse, interdisciplinary group of experts from academia, industry, and civil society in a workshop-style environment. In panel discussions, we considered potential ways of acting on algorithmic (un)fairness and opacity. We sought to consider the fullest possible range of ‘solutions,’ including technical implementations (algorithms, user-interface designs), law and policy, standard-setting, incentive programs, new organizational processes, labor organizing, and direct action.
Researchers (e.g., Barocas and Selbst 2016; Kleinberg et al. 2017), journalists (e.g., Miller 2015), and even the federal government (e.g., Executive Office of the President 2016) have become increasingly attuned to issues of algorithmic opacity, bias, and fairness, debating them across a range of applications, including criminal justice (Angwin et al. 2016, Chouldechova 2017, Berk et al. 2017), online advertising (Datta et al. 2018), natural language processing (Bolukbasi et al. 2016), consumer credit (Waddell 2016), and image recognition (Simonite 2017; Buolamwini and Gebru 2018).
There has been recent progress especially in understanding algorithmic fairness as a technical problem. Drawing from various formal definitions of fairness (see Narayanan 2018; Corbett-Davies and Goel 2018; Kleinberg et al. 2017), researchers have identified a range of techniques for addressing fairness in algorithm-driven classification and prediction. Some approaches focus on addressing allocative harms by fairly allocating opportunities or resources. These include fairness through awareness (Dwork et al. 2012), accuracy equity (Angwin et al. 2016; Dieterich et al. 2016), equality of opportunity (Hardt et al. 2016), and fairness constraints (Zafar et al. 2017). Other approaches tackle issues of representational harms which occur when a system diminishes specific groups or reinforces stereotypes based on identity (see Crawford 2017). Proposed solutions include corpus-level constraints to prevent the amplification of gender stereotypes in language corpora (Zhao et al. 2017), diversity algorithms (Drosou et al. 2017), causal reasoning to assess whether a protected attribute has an effect on a predictor (Kilbertus et al. 2017, Kusner et al. 2017), and inclusive benchmark datasets to address intersectional accuracy disparities (Buolamwini and Gebru 2018).
These new approaches are invaluable in motivating technical communities to think about the issues and make progress on addressing them. But the conversation neither starts nor ends there. Our interdisciplinary group sought to complement and challenge the technical framing of fairness and opacity issues. In our workshop, we considered the strengths and limitations of a technical approach and discussed where and when hand-offs, human augmentation, and oversight are valuable and necessary. We considered ways of engaging a wide-ranging set of perspectives and roles, including professionals with deep domain expertise, activists involved in reform efforts, financial auditors, scholars, as well as diverse system users and their allies. In doing so, we considered models that might be transferable looking to various fields including network security, financial auditing, safety critical systems, and civil rights campaigns.
Below is a brief summary of the panel topics and general themes of the discussion. Full write-ups for each panel are linked. Our aim in these write ups is not to simply report a chronological account of the panel, but to synthesize and extend the panel discussions. These panel reports take a position on the topic and offer a set of concrete proposals. We also seek to identify areas of limited knowledge, open questions, and research opportunities. We intend for these documents to inform an audience of researchers, implementers, practitioners, and policy-makers.
Panel 1 was entitled “What a technical ‘fix’ for fairness can and can’t accomplish.” Panelists and audience members discussed specific examples of problems of fairness (and justice), including cash bail in the criminal justice system, “bad faith” search phrases (e.g., the question, “Did the Holocaust happen?”), and representational harm in image-labeling. Panelists noted a key challenge that technology, on its own, is not good at explaining when it should not be used or when it has reached its limits. Panelists pointed out that understanding broader historical and sociological debates in the domain of application and investigating contemporary reform efforts, for example in criminal justice, can help to clarify the place of algorithmic prediction and classification tools in a given domain. Partnering with civil-society groups can ensure a sound basis for making tough decisions about when and how to intervene when a platform or software is found to be amplifying societal biases, is being gamed by “bad” actors, or otherwise facilitates harm to users. [READ REPORT]
Panelists for Panel 1: Lena Z. Gunn (Electronic Frontier Foundation), Moritz Hardt (UC Berkeley Department of Electrical Engineering and Computer Sciences), Abigail Jacobs (UC Berkeley Haas School of Business), Andy Schou (Google). Moderator: Sarah M. Brown (Brown University Division of Applied Mathematics).
Panel 2, entitled “Automated decision-making is imperfect, but it’s arguably an improvement over biased human decision-making,” describes a common rejoinder to criticism of automated decision-making. This panel sought to consider the assumptions of this comparison between humans and machine automation. There is a need to account for differences in the kinds of biases associated with human decision-making (including cognitive biases of all sorts) and those uniquely generated by machine reasoning. The panel discussed the ways that humans rely on or reject decision-support software. For example, work by one of the panelists, Professor Angèle Christin, shows how algorithmic tools deployed in professional environments may be contested or ignored. Guidelines directed at humans about how to use particular systems of algorithmic classification in low- as opposed to high-stakes domains can go unheeded. This seemed to be the case in at least one example of how Amazon’s facial recognition system has been applied in a law-enforcement context. Such cases underscore the point that humans aren’t generally eliminated when automated-decision systems are deployed; they still decide how they are to be configured and implemented, which may disrupt whatever gains in “fairness” might otherwise be realized. Rather than working to establish which is better–human or machine decision-making–we suggest developing research on the most effective ways to bring automated tools and humans together to form hybrid decision-making systems. [READ REPORT]
Panelists for Panel 2: Angèle Christin (Stanford University Department of Communication), Marion Fourçade (UC Berkeley Department of Sociology), M. Mitchell (Google), Josh Kroll (UC Berkeley School of Information). Moderator: Deirdre Mulligan (UC Berkeley School of Information).
Panel 3 on “Human Autonomy and Empowerment” examined how we can enhance the autonomy of humans who are subject to automated decision-making tools. Focusing on “fairness” as a resource allocation or algorithmic problem tends to assume it is something to be worked out by experts. Taking an alternative approach, we discussed how users and other ‘stakeholders’ can identify errors, unfairness, and make other kinds of requests to influence and improve the platform or system in question. What is the best way to structure points of user feedback? Panelists pointed out that design possibilities range from lightweight feedback mechanisms to support for richer, agonistic debate. Not-for-profit models, such as Wikipedia, demonstrate the feasibility of high transparency and open debate about platform design. Yet participation on Wikipedia, while technically open to anyone, requires a high investment of time and energy to develop mastery of the platform and the norms of participation. “Flagging” functions, on the other hand, are pervasive, lightweight tools found on most mainstream platforms. However, they often serve primarily to shift governance work onto users without the potential to fundamentally influence platform policies or practices. Furthermore, limiting consideration to the autonomy of platform users misses the crucial fact that many automated decisions are imposed on people who never use the system directly. [READ REPORT]
Panelists for Panel 3: Stuart Geiger (UC Berkeley Institute for Data Science), Jen Gennai (Google), and Niloufar Salehi (Stanford University Department of Computer Science). Moderator: Jenna Burrell (UC Berkeley School of Information).
Panel 4 was entitled “Auditing Algorithms (from Within and from Without).” Probing issues of algorithmic accountability and oversight, panelists recognized that auditing (whether in finance or safety-critical industries) promotes a culture of “slow down and do a good job,” which runs counter to the “move fast and break things” mindset that has long defined the tech industry. Yet corporations, including those in the tech sector, do have in-house auditing teams (in particular, for financial auditing) whose expertise and practices could serve as models. Generally, internal audits concern the quality of a process rather than the validity of the “outputs.” Panelists pointed out that certain processes developed for traditional auditing might work for auditing “fairness,” as well. A “design history file,” for example, is required in the development of medical devices to provide transparency that facilitates FDA review. In the safety-critical arena, there are numerous techniques and approaches, including structured safety cases, hazard analysis, instrumentation and monitoring, and processes for accident investigation. But there are also particular challenges “fairness” presents to attempts to develop an audit process for algorithms and algorithmic systems. For one, and recalling Panel 1’s discussion, there are numerous valid definitions of fairness. In addition, problems of “fairness” are often not self-evident or exposed through discrete incidents (as accidents are in safety-critical industries). These observations suggest a need to innovate auditing procedures if they are to be applied to the specific challenges of algorithmic fairness. [READ REPORT]
Panelists for Panel 4: Chuck Howell (MITRE), Danie Theron (Google), Michael Tschantz (International Computer Science Institute). Moderator: Allison Woodruff (Google).