Many of the techniques used by Artificial Intelligence systems - from official statistics to health records to online metadata to sensors and satellites - rely on training via vast quantities of data.
We live in an age of incredible and increasing growth in computing power. When the first personal computers were introduced to the public, in 1977, a top of the line machine--the Apple II--came with 4KB RAM, a screen that could show only 24 lines of text (and even then, only in capital letters), and an audio cassette interface (remember those!) for reading and recording data.1 But fast forward to the present. We now have computers in the form of phones that fit in our pockets and—in the case of the 2020 Apple iPhone 12 Pro—that can hold 1.5 million more bits of information in its memory than the first Apple II. That rate of growth is astounding, and we’ve witnessed equally exponential growth in our ability to collect and record information in digital form. Unfortunately, we’ve witnessed the same growth in the ability to have information collected about us (figure 1b).
The act of collecting and recording data about people is, of course, not new at all. From the registers of the dead that were published by church officials in the early modern era to the counts of Indigenous populations that appeared in colonial accounts of the Americas, data collection has long been employed as a technique of consolidating knowledge about the people whose data are collected, and therefore consolidating power over their lives.2 The close relationship between data and power is perhaps most clearly visible in the historical arc that begins with the logs of people captured and placed aboard slave ships, reducing richly lived lives to numbers and names. It passes through the eugenics movement, in the late nineteenth and early twentieth centuries, which sought to employ data to quantify the superiority of white people over all others. It continues today in the proliferation of biometrics technologies that, as sociologist Simone Browne has shown, are disproportionately deployed to surveil Black bodies.3
When Edward Snowden, the former US National Security Agency contractor, leaked his cache of classified documents to the press in 2013, he revealed the degree to which the federal government routinely collects data on its citizens—often with minimal regard to legality or ethics.4 At the municipal level, too, governments are starting to collect data on everything from traffic movement to facial expressions in the interests of making cities “smarter.”5 This often translates to reinscribing traditional urban patterns of power such as segregation, the overpolicing of communities of color, and the rationing of ever-scarcer city services.6
But the government is not alone in these data-collection efforts; corporations do it too—with profit as their guide. The words and phrases we search for on Google, the times of day we are most active on Facebook, and the number of items we add to our Amazon carts are all tracked and stored as data—data that are then converted into corporate financial gain. The most trivial of everyday actions—searching for a way around traffic, liking a friend’s cat video, or even stepping out of our front doors in the morning—are now hot commodities. This is not because any of these actions are exceptionally interesting (although we do make an exception for Catherine’s cats) but because these tiny actions can be combined with other tiny actions to generate targeted advertisements and personalized recommendations—in other words, to give us more things to click on, like, or buy.7
This is the data economy, and corporations, often aided by academic researchers, are currently scrambling to see what behaviors—both online and off—remain to be turned into data and then monetized. Nothing is outside of datafication, as this process is sometimes termed—not your search history, or Catherine’s cats, or the butt that Lauren is currently using to sit in her seat. To wit: Shigeomi Koshimizu, a Tokyo-based professor of engineering, has been designing matrices of sensors that collect data at 360 different positions around a rear end while it is comfortably ensconced in a chair.8 He proposes that people have unique butt signatures, as unique as their fingerprints. In the future, he suggests, our cars could be outfitted with butt-scanners instead of keys or car alarms to identify the driver.
Although datafication may occasionally verge into the realm of the absurd, it remains a very serious issue. Decisions of civic, economic, and individual importance are already and increasingly being made by automated systems sifting through large amounts of data. For example, PredPol, a so-called predictive policing company founded in 2012 by an anthropology professor at the University of California, Los Angeles, has been employed by the City of Los Angeles for nearly a decade to determine which neighborhoods to patrol more heavily, and which neighborhoods to (mostly) ignore. But because PredPol is based on historical crime data and US policing practices have always disproportionately surveilled and patrolled neighborhoods of color, the predictions of where crime will happen in the future look a lot like the racist practices of the past.9 These systems create what mathematician and writer Cathy O’Neil, in Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, calls a “pernicious feedback loop,” amplifying the effects of racial bias and of the criminalization of poverty that are already endemic to the United States.
O’Neil’s solution is to open up the computational systems that produce these racist results. Only by knowing what goes in, she argues, can we understand what comes out. Transparency is a key step in the project of mitigating the effects of biased data. Yet we can do more than auditing discriminatory systems after the fact. Our current world requires more, and this is where data feminism comes in.
Data feminism is a way of thinking about data, their analysis, and their display, that is informed by the rich history of feminist activism and feminist critical thought. Data feminism begins with a belief in gender equality, and a recognition that achieving equality for folks of all genders (and all races, and all sexual orientations, and all locations in the world) requires a commitment to examining the root cause of the inequalities that certain individuals and groups face today. In the case of PredPol, data feminism would additionally require that we trace its biased data back to their source. The root cause of the racial bias in “three most objective data points” that PredPol employs is the long history of the criminalization of Blackness in the United States, which produces biased policing practices, which produce biased historical data, which are then used to develop risk models for the future.10 Tracing these links to historical and ongoing forces of oppression can help us answer the ethical question, Should this system exist?11 This is the work of data feminism too. And in the case of PredPol, the answer is a resounding no.
Understanding this long and complicated chain reaction is what has motivated Yeshimabeit Milner, along with US-based activists, organizers, and mathematicians, to found Data for Black Lives, an organization dedicated to “using data science to create concrete and measurable change in the lives of Black communities.”12 Groups like the Stop LAPD Spying coalition are using explicitly feminist and antiracist methods to quantify and challenge invasive data collection by law enforcement.13 Data journalists are reverse-engineering algorithms and collecting qualitative data at scale about maternal harm.14 Artists are inviting participants to perform ecological maps and using AI for making intergenerational family memoirs (figure 2a).15
This work is by no means limited to the North. In Tanzania, for example, the group DataZetu (“Our Data” in Swahili) worked with community partners to run a design competition for creating fabrics with statistics about gender based violence embedded in the patterns, and then held a fashion show with the winners (figure 2b). Activists in Latin America are documenting the women, girls and trans people murdered in feminicides, and civil society groups are creating data standards and building networks to use data to challenge gender-based violence16. In Argentina, groups like Economia Feminista (“Feminist Economics” in Spanish) are using crowdsourced data to build feminist voter guides, which they distribute through a website called Feminindex (figure 2c). The list goes on.
All these projects are data science. Many people think of data as numbers alone, but as these projects demonstrate, data can also consist of words or stories, colors or sounds, or any type of information that is systematically collected, organized, and analyzed. The science in data science simply implies a commitment to systematic methods of observation and experiment. Throughout this article, we deliberately place diverse data science examples alongside each other. They come from individuals and small groups, and from across academic, artistic, nonprofit, journalistic, community-based, and for-profit organizations. This is due to our belief in a capacious definition of data science, one that seeks to include rather than exclude, and does not erect barriers based on formal credentials, professional affiliation, size of data, complexity of technical methods, or other external markers of expertise. Such markers, after all, have long been used to prevent women from fully engaging in any number of professional fields, even as those fields—which include data science and computer science, among many others—were largely built on the knowledge that women were required to teach themselves.17 An attempt to push back against this gendered history is foundational to data feminism, too.
Feminism has been defined and used in many ways. Here and in our book, we employ the term feminism as a shorthand for the diverse and wide-ranging projects that name and challenge sexism and other forces of oppression, as well as those which seek to create more just, equitable, and livable futures. Because of this broadness, some scholars prefer to use the term feminisms, which clearly signals the range of—and, at times, the incompatibilities among—these various strains of feminist activism and political thought. For reasons of readability, we choose to use the term feminism here, but our feminism is intended to be just as expansive. It includes the work of regular folks and public intellectuals, as well as organizing groups that have taken direct action to achieve the equality of the sexes. It also includes the work of scholars and other cultural critics who have used writing to explore the social, political, historical, and conceptual reasons behind the inequality of the sexes that we face today.
In the process, these writers and activists have given voice to the many ways in which today’s status quo is unjust.18 These injustices are often the result of historical and contemporary differentials of power, including those among men, women, and nonbinary people, as well as those among cisgender and transgender people, white women and Black women, academic researchers and Indigenous communities, and people in the Global North and the Global South. Feminists analyze these power differentials so that they can change them. And while such a broad focus—one that incorporates race, class, ability, and more—might sound strange to those who think feminism is only about gender, the reality is that any movement for gender equality must consider the ways in which various forms of privilege on the one hand, and oppression on the other, are intersectional.
Because the concept of intersectionality is essential for understanding and applying data feminism, let’s get a bit more specific. The term was coined by legal theorist Kimberlé Crenshaw in the late 1980s.19 In law school, Crenshaw had come across the antidiscrimination case of DeGraffenreid v. General Motors. Emma DeGraffenreid was a Black working mother who had sought a job at a General Motors factory in her town. She was not hired and sued GM for discrimination. The factory did have a history of hiring Black people: many Black men worked in industrial and maintenance jobs there. They also had a history of hiring women: many white women worked there as secretaries. These two pieces of evidence provided the rationale for the judge to throw out the case. Because the company did hire Black people and did hire women, it could not be discriminating based on race or gender. But, Crenshaw wanted to know, what about discrimination on the basis of race and gender together? This was something different, it was real, and it needed to be named.20
Key to the idea of intersectionality is that it does not only describe the intersecting aspects of any particular person’s identity (or positionalities, as they are sometimes termed).21 It also describes the intersecting forces of privilege and oppression at work in a given society. Oppression involves the systematic mistreatment of certain groups of people by other groups. It happens when power is not distributed equally—when one group controls the institutions of law, education, and culture, and uses its power to systematically exclude other groups while giving its own group unfair advantages (or simply maintaining the status quo).22 In the case of gender oppression, we can point to the sexism, cissexism, and patriarchy that is evident in everything from political representation to the wage gap to who speaks more often (or more loudly) in a meeting. In the case of racial oppression, this takes the form of racism and white supremacy. Other forms of oppression include ableism, colonialism, and classism. Each has its particular history and manifests differently in different cultures and contexts, but all involve a dominant group that accrues power and privilege at the expense of others. Moreover, these forces of power and privilege on the one hand and oppression on the other mesh together in ways that multiply their effects.
The effects of privilege and oppression are not distributed evenly across all individuals and groups, however. For some, they become an obvious and unavoidable part of daily life, particularly for women and people of color and queer people and immigrants: the list goes on. If you are a member of any or all of these (or other) minoritized groups, you experience their effects everywhere, shaping the choices you make (or don’t get to make) each day. These systems of power are as real as rain. But forces of oppression can be difficult to detect when you benefit from them (we call this a privilege hazard in our book). And this is where we come back around to the idea of data feminism. Our starting point is something that feminists know to be a basic truth, but that goes mostly unacknowledged in the field of data science: power is not distributed equally in the world. Those who wield power are disproportionately elite, straight, white, non-disabled, cisgender men from the Global North.23 The work of data feminism is first to tune into how standard practices in data science serve to reinforce these existing inequalities and second to use data science to challenge and change the distribution of power.24 Underlying data feminism is a belief in and commitment to co-liberation: the idea that oppressive systems of power harm all of us, that they undermine the quality and validity of our work, and that they hinder us from creating true and lasting social impact with data science.
Throughout its own history, feminism has consistently had to work to convince the world that it is relevant to people of all genders. We make the same argument: that data feminism is for everybody. (And here we borrow a line from bell hooks.)25 You may have already noticed that the examples we use are not only about women, nor are they created only by women. That’s because data feminism isn’t only about women. It takes more than one gender to have gender inequality and more than one gender to work toward justice. Likewise, data feminism isn’t only for women. Men, nonbinary, and genderqueer people are proud to call themselves feminists and use feminism in their work. Furthermore, data feminism isn’t only about gender. Intersectional feminists have keyed us into how race, class, sexuality, ability, age, religion, geography, and more are factors that together influence each person’s experience and opportunities in the world. Data feminism is about power—about who has it and who doesn’t. Intersectional feminism examines unequal power. And in our contemporary world, data is power too. Because the power of data is wielded unjustly, it must be challenged and changed.
Data is a double-edged sword. In a very real sense, data have been used as a weapon by those in power to consolidate their control—over people as well as places and things. Indeed, a central goal of Data Feminism is to show how governments and corporations have long employed data and statistics as management techniques to preserve an unequal status quo. Working with data from a feminist perspective requires knowing and acknowledging this history. But this flawed history does not mean ceding control of the future to the powers of the past. Data are part of the problem, to be sure. But they are also part of the solution. Another central goal of the data feminism project is to show how the power of data can be wielded back.
To guide us in this work, we have developed seven core principles. Individually and together, these principles emerge from the foundation of intersectional feminist thought. Each of the chapters in our book are structured around a single principle. The seven principles of data feminism are as follows:
Examine power. Data feminism begins by analyzing how power operates in the world.
Examinar el poder. El feminismo de datos empieza por analizar cómo opera el poder en el mundo.
Challenge power. Data feminism commits to challenging unequal power structures and working toward justice.
Desafiar el poder. El feminismo de datos se compromete a desafiar las estructuras de poder desiguales y a trabajar por la justicia.
Elevate emotion and embodiment. Data feminism teaches us to value multiple forms of knowledge, including the knowledge that comes from people as living, feeling bodies in the world.
Elevar la emoción y la corporalidad. El feminismo de datos nos enseña a valorar múltiples formas de conocimiento, incluyendo los conocimientos de las personas en su carácter de cuerpos vivos y sensibles en el mundo.
Rethink binaries and hierarchies. Data feminism requires us to challenge the gender binary, along with other systems of counting and classification that perpetuate oppression.
Repensar los binarismos y las jerarquías. El feminismo de datos nos obliga a desafiar el binarismo de género, así como otros sistemas de cuantificación y clasificación que perpetúan las opresiones.
Embrace pluralism. Data feminism insists that the most complete knowledge comes from synthesizing multiple perspectives, with priority given to local, Indigenous, and experiential ways of knowing.
Adoptar el pluralismo. El feminismo de datos insiste en que el conocimiento más completo surge de sintetizar múltiples perspectivas, y prioriza los saberes locales, indígenas y basados en la experiencia.
Consider context. Data feminism asserts that data are not neutral or objective. They are the products of unequal social relations, and this context is essential for conducting accurate, ethical analysis.
Considerar el contexto. El feminismo de datos afirma que los datos no son ni neutrales ni objetivos. Son producto de relaciones sociales desiguales, y este contexto es esencial para realizar un análisis ético y preciso.
Make labor visible. The work of data science, like all work in the world, is the work of many hands. Data feminism makes this labor visible so that it can be recognized and valued.
Hacer visible el trabajo. El trabajo de la ciencia de datos, como todo trabajo en el mundo, es el trabajo de muchas manos. El feminismo de datos hace visible esta labor, para que pueda ser reconocida y valorada.
Data Feminism graphics by Catherine D'Ignazio, Lauren Klein and Marcia Diaz, 2020. Spanish translation by Helena Suárez Val. These graphics are open access and also available in French, Korean and Portuguese. They may be downloaded here: http://datafeminism.io/blog/book/data-feminism-infographic/
In our book, we explore each of these principles in more detail, drawing upon examples from the field of data science, expansively defined, to show how that principle can be put into action. Along the way, we introduce key feminist concepts like the matrix of domination (Patricia Hill Collins), situated knowledge (Donna Haraway), and emotional labor (Arlie Hochschild), as well as some of our own ideas about what data feminism looks like in theory and practice. To this end, we introduce readers to a range of folks at the cutting edge of data and justice. These include engineers and software developers, activists and community organizers, data journalists, artists, and scholars. This variety of people, and the variety of projects they have helped to create, is our way of answering the question: What makes a data science project feminist? As will become clear, a data science project may be feminist in content, in that it challenges power by choice of subject matter; in form, in that it challenges power by shifting the aesthetic and/or sensory registers of data communication; and/or in process, in that it challenges power by building participatory, inclusive processes of knowledge production. What unites this broad scope of data work is a commitment to action and a desire to remake the world to be more equitable and inclusive.
Our overarching goal is to take a stand against the status quo—against a world that benefits us, two white, cisgender, non-disabled college professors located in the Global North, at the expense of others. Our principles are intended to function as concrete steps to action for data scientists seeking to learn how feminism can help them work toward justice, and for feminists seeking to learn how their own work can carry over to the growing field of data science. They are also addressed to professionals in all fields in which data-driven decisions are being made, as well as to communities that want to resist or mobilize the data that surrounds them. They are written for everyone who seeks to better understand the charts and statistics that they encounter in their day-to- day lives, and for everyone who seeks to communicate the significance of such charts and statistics to others.
Our claim, once again, is that data feminism is for everyone. It’s for people of all genders. It’s by people of all genders. And most importantly: it’s about much more than gender. Data feminism is about power, about who has it and who doesn’t, and about how those differentials of power can be challenged and changed using data. We invite you to join us on this journey toward justice and toward remaking our data-driven world.
More About Data Feminism
Data Feminism is an open access book published by MIT Press in 2020. You can read it for free online at https://data-feminism.mitpress.mit.edu/ or buy it from your local independent bookstore.
Alexander, Michelle. The New Jim Crow. The New Press, 2012.
Anderson, Margo J. The American Census: A Social History. Yale University Press, 2015.
Benjamin, Ruha. Race After Technology: Abolitionist Tools for the New Jim Code. 1st edition. Medford, MA: Polity, 2019.
Browne, Simone. Dark Matters: On the Surveillance of Blackness. Duke University Press, 2015.
CHM. ‘Home’. CHM. Accessed 23 May 2021. https://computerhistory.org/.
Cooper, Brittney C., Susana M. Morris, and Robin M. Boylorn, eds. The Crunk Feminist Collection. New York: The Feminist Press at CUNY, 2017.
Crenshaw, Kimberle. ‘Demarginalizing the Intersection of Race and Sex: A Black Feminist Critique of Antidiscrimination Doctrine, Feminist Theory and Antiracist Politics’. University of Chicago Legal Forum 1989, no. 1 (7 December 2015). https://chicagounbound.uchicago.edu/uclf/vol1989/iss1/8.
———. ‘Mapping the Margins: Intersectionality, Identity Politics, and Violence against Women of Color’. Stanford Law Review 43, no. 6 (1991): 1241–99. https://doi.org/10.2307/1229039.
Crossman, Ashley. ‘What Sociology Can Teach Us About Oppression’. ThoughtCo. Accessed 23 May 2021. https://www.thoughtco.com/social-oppression-3026593.
D’Ignazio. ‘Catherine D’Ignazio on Instagram: “There’s Truly Nothing More Pleasing than a #MaineCoonCat in a Rainbow Lei”’. Accessed 23 May 2021. https://www.instagram.com/p/BgxGicVhhTW/.
DuVernay, Ava. 13th. Documentary, Crime, History. Forward Movement, Kandoo Films, Netflix, 2016.
Ehrenreich, Barbara. Witches, Midwives, and Nurses: A History of Women Healers. 2nd edition. New York City: The Feminist Press at CUNY, 2010.
Eubanks, Virginia. Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. New York, NY: St. Martin’s Press, 2018.
Fake, Caterina. ‘Should This Exist’. Should This Exist? Accessed 23 May 2021. https://shouldthisexist.com/.
Farrell, Molly. Counting Bodies: Population in Colonial American Writing. Oxford University Press, 2016.
Gallardo, Adriana. ‘How We Collected Nearly 5,000 Stories of Maternal Harm’. ProPublica. Accessed 23 May 2021. https://www.propublica.org/article/how-we-collected-nearly-5-000-stories-of-maternal-harm?token=gKXJz8oaquJjtQQXfWohwVdXFUbpvysX.
Hooks, Bell. Feminism Is for Everybody: Passionate Politics. Pluto Press, 2000.
Hugs, Robot. ‘Having Trouble Explaining Oppression? This Comic Can Do It for You’. Everyday Feminism, 30 January 2017. https://everydayfeminism.com/2017/01/trouble-explaining-oppression/.
Intellectual Genealogies, Intersectionality, and Anna Julia Cooper: Vivian M. May. Feminist Solidarity at the Crossroads. Routledge, 2012. https://doi.org/10.4324/9780203145050-12.
Johnson, Jessica Marie. ‘Markup Bodies: Black [Life] Studies and Slavery [Death] Studies at the Digital Crossroads’. Social Text 36, no. 4 (137) (1 December 2018): 57–79. https://doi.org/10.1215/01642472-7145658.
Koh, Yoree. ‘Forget Fingerprints: Car Seat IDs Driver’s Rear End’. Wall Street Journal, 18 January 2012, sec. Driver’s Seat. https://www.wsj.com/articles/BL-DSB-8296.
Light, Jennifer S. ‘When Computers Were Women’. Technology and Culture 40, no. 3 (1999): 455–83.
Logipix Ltd. ‘Safe and Smart Cities’. Logipix. Accessed 23 May 2021. http://www.logipix.com/index.php/safe-and-smart-cities.
Mattern, Shannon. ‘Mission Control: A History of the Urban Dashboard’. Places Journal, 9 March 2015. https://doi.org/10.22269/150309.
Mattu, Julia Angwin, Jeff Larson,Lauren Kirchner,Surya. ‘Machine Bias’. ProPublica. Accessed 23 May 2021. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing?token=z-JINiwkIjlB5pxCVLzVjG9Q2IGCjWJS.
Mogel, Lize. ‘Walking the Watershed-The Project’. Accessed 23 May 2021. https://www.walkingthewatershed.com/home/.
Moraga, Cherríe. This Bridge Called My Back, Fourth Edition: Writings by Radical Women of Color. Edited by Gloria Anzaldúa. 4th edition. Albany: State University of New York Press, 2015.
O’Neil, Cathy. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Penguin UK, 2016.
PredPol. ‘About Us’. PredPol (blog). Accessed 23 May 2021. https://www.predpol.com/about/.
———. ‘Predictive Policing Technology’. PredPol (blog). Accessed 23 May 2021. https://www.predpol.com/technology/.
Raley, Rita. ‘Dataveillance and Countervailance’, 2013. https://escholarship.org/uc/item/2b12683k.
Simon, Patrick. ‘Collecting Ethnic Statistics in Europe: A Review’. Ethnic and Racial Studies 35, no. 8 (1 August 2012): 1366–91. https://doi.org/10.1080/01419870.2011.607507.
Spade, Dean, and Rori Rohlfs. ‘Legal Equality, Gay Numbers and the (After?)Math of Eugenics’. S&F Online. Accessed 23 May 2021. http://sfonline.barnard.edu/navigating-neoliberalism-in-the-academy-nonprofits-and-beyond/dean-spade-rori-rohlfs-legal-equality-gay-numbers-and-the-aftermath-of-eugenics/.
Stop LAPD Spying Coalition. ‘To Observe and to Suspect”: A People’s Audit of the Los Angeles Police Department’s Special Order 1’, 19 March 2013. https://stoplapdspying.org/wp-content/uploads/2013/04/PEOPLES-AUDIT-UPDATED-APRIL-2-2013-A.pdf.
The Latin American Initiative for Open Data. ‘( Guía Para Protocolizar Procesos de Identificación de Feminicidios’. ILDA (blog), 1 September 2020. http://idatosabiertos.org/guia-para-protocolizar-procesos-de-identificacion-de-feminicidios/.
Vaz, Kim Marie, and Gary L Lemons. Feminist Solidarity at the Crossroads: Intersectional Women’s Studies for Transracial Alliance. New York: Routledge, 2012.
Wernimont, Jacqueline. Numbered Lives: Life and Death in Quantum Media. MIT Press, 2019.