Digital Curator Mia Ridge writes, 'It seems a good moment to share some of the articles we've discussed as a primer on how and why technologies and working practices in libraries and digital scholarship are not neutral'.
'Do the best you can until you know better. Then when you know better, do better.'
― Attributed to Maya Angelou
The Digital Scholarship Reading Group is one of the ways the Digital Research team help British Library staff grapple with emerging technologies and methods that could be used in research and scholarship with collections. Understanding the impact of the biases that new technologies such as AI and machine learning can introduce through algorithmic or data sourcing decisions has been an important aspect of these discussions since the group was founded in 2016. As we began work on what would eventually become the Living with Machines project, our readings became particularly focused on AI and data science, aiming to ensure that we didn't do more harm than good.
Reading is only the start of the anti-racism work we need to do. However, reading and discussing together, and bringing the resulting ideas and questions into discussions about procuring, implementing and prioritising digital platforms in cultural and research institutions is a relatively easy next step.
I've listed the topics under the dates we discussed them, and sometimes added a brief note on how it is relevant to intersectional issues of gender, racism and digital scholarship or commercial digital methods and tools. We always have more to learn about these issues, so we'd love to hear your recommendations for articles or topics (contact details here).
Abstract: This case study describes a project undertaken at the University of Minnesota Libraries to digitize materials related to African American materials across the Universities holdings, and to highlight materials that are otherwise undiscoverable in existing archival collections. It explores how historical and current archival practices marginalize material relevant to African American history and culture, and how a mass digitization process can attempt to highlight and re-aggregate those materials. The details of the aggregation process — e.g. the need to use standardized vocabularies to increase aggregation even when those standardized vocabularies privilege majority representation — also reveal important issues in mass digitization and aggregation projects involving the history of marginalized groups.
Discussed June 2020.
For this Reading Group Session, we will be doing something a little different and discussing a podcast on The Nightmare of Surveillance Capitalism. This podcast is hosted by Talking Politics, and is a discussion with Shoshana Zuboff who has recently published The Age of Surveillance Capitalism (January, 2019).
For those of you who would also like to bring some reading to the table, we can also consult the reviews of this book as a way of engaging with reactions to the topic. Listed below are a few examples, but please bring along any reviews that you find to be especially thought provoking:
Discussed November 2019. Computational or algorithmic 'surveillance' and capitalism have clear links to structural inequalities.
Kate Crawford, Distinguished Research Professor at New York University, a Principal Researcher at Microsoft Research New York, and the co-founder and co-director the AI Now Institute, discusses the biases built into machine learning, and what that means for the social implications of AI. The talk is the fourth event in the Royal Society’s 2018 series: You and AI.
Discussed October 2018.
'Facial Recognition Is Accurate, if You’re a White Guy'
Read or watch any one of:
'Facial Recognition Is Accurate, if You’re a White Guy' By Steve Lohr
Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification by Joy Buolamwini, Timnit Gebru
Abstract: Recent studies demonstrate that machine learning algorithms can discriminate based on classes like race and gender. In this work, we present an approach to evaluate bias present in automated facial analysis algorithms and datasets with respect to phenotypic subgroups. Using the dermatologist approved Fitzpatrick Skin Type classification system, we characterize the gender and skin type distribution of two facial analysis benchmarks, IJB-A and Adience. We find that these datasets are overwhelmingly composed of lighter-skinned subjects (79.6% for IJB-A and 86.2% for Adience) and introduce a new facial analysis dataset which is balanced by gender and skin type. We evaluate 3 commercial gender classification systems using our dataset and show that darker-skinned females are the most misclassified group (with error rates of up to 34.7%). The maximum error rate for lighter-skinned males is 0.8%. The substantial disparities in the accuracy of classifying darker females, lighter females, darker males, and lighter males in gender classification systems require urgent attention if commercial companies are to build genuinely fair, transparent and accountable facial analysis algorithms.
How I'm fighting bias in algorithms (TED Talk) by Joy Buolamwini
Abstract: MIT grad student Joy Buolamwini was working with facial analysis software when she noticed a problem: the software didn't detect her face -- because the people who coded the algorithm hadn't taught it to identify a broad range of skin tones and facial structures. Now she's on a mission to fight bias in machine learning, a phenomenon she calls the "coded gaze." It's an eye-opening talk about the need for accountability in coding ... as algorithms take over more and more aspects of our lives.
Discussed April 2018, topic suggested by Adam Farquhar.
Abstract: In this article I reflect on the process of conducting historical research in digital archives from a feminist perspective. After reviewing issues that arose in conjunction with the British Library’s digitisation of the feminist magazine Spare Rib in 2013, I offer three questions researchers should consider before consulting materials in a digital archive. Have the individuals whose work appears in these materials consented to this? Whose labour was used and how is it acknowledged? What absences must be attended to among an abundance of materials? Finally, I suggest that researchers should draw on the existing body of scholarship about these issues by librarians and archivists.
Discussed October 2017.
From their introduction: 'we are also invested in the development of a practice-based digital humanities that attends to the crucial issues of race, class, gender, and sexuality in the undergraduate classroom and beyond. Our White Violence, Black Resistance project merges foundational digital humanities approaches with issues of social justice by engaging students and the community in digitizing and interpreting historical moments of racial conflict. The project exemplifies an activist model of grassroots recovery that brings to light timely historical documents at the same time that it exposes power differentials in our own institutional settings and reveals the continued racial violence spanning 1868 Millican, Texas, to 2014 Ferguson, Missouri.'
Discussed August 2017.
Abstract: Literary study in the digital humanities is not exempt from reproducing historical hierarchies by focusing on major or canonical figures who have already been recognized as important historical or literary figures. However, network analysis of periodical publications may offer an alternative to the biases of human memory, where one has the tendency to pay attention to a recognizable name, rather than one that has had no historical significance. It thus enables researchers to see connections and a wealth of data that has been obscured by traditional recovery methodologies. Machine reading with network analysis can therefore contribute to an alternate understanding of women’s history, one that reinterprets cultural and literary histories that tend to reconstruct gender-based biases. This paper uses network analysis to explore the Fabian News, a late nineteenth-century periodical newsletter produced by the socialist Fabian Society, to recover women activists committed to social and political equality.
Discussed July 2017.
From the introduction: At issue is the claim that the machines, structures, and systems of modern material culture can be accurately judged not only for their contributions of efficiency and productivity, not merely for their positive and negative environmental side effects, but also for the ways in which they can embody specific forms of power and authority.
Discussed April 2017. A classic text from 1980 that describes how seemingly simple design factors can contribute to structural inequalities.
Abstract: Diverse groups argue about the potential benefits and costs of analyzing genetic sequences, social media interactions, health records, phone logs, government records, and other digital traces left by people. Significant questions emerge. Will large-scale search data help us create better tools, services, and public goods? Or will it usher in a new wave of privacy incursions and invasive marketing? Will data analytics help us understand online communities and political movements? Or will it be used to track protesters and suppress speech? Will it transform how we study human communication and culture, or narrow the palette of research options and alter what ‘research’ means? Given the rise of Big Data as a socio-technical phenomenon, we argue that it is necessary to critically interrogate its assumptions and biases. In this article,we offer six provocations to spark conversations about the issues of Big Data: a cultural, technological, and scholarly phenomenon that rests on the interplay of technology, analysis, and mythology that provokes extensive utopian and dystopian rhetoric.
Discussed August 2016, suggested by Aquiles Alencar Brayner.
This blog post is by Mia Ridge, Digital Curator for Western Heritage Collections and Co-Investigator for Living with Machines. She's on twitter at @mia_out.