Skip to main content
SearchLoginLogin or Signup

Feminist Data Collection: Building a Vision of an Inclusive System

This is a review article that examines recommendations and examples of how to implement the principles of data feminism introduced earlier in this chapter by D’Ignazio and Klein.

Published onSep 13, 2021
Feminist Data Collection: Building a Vision of an Inclusive System


Bias exists in current data collection practices, leaving women and girls invisible in the data. Deliberate effort is needed to overcome the problems of traditional methods of data collection. In a feminist data collection system, averages would be deconstructed through appropriate disaggregation, and intersectional analysis would be made possible. And the use of untapped data sources would be expanded while mitigating their challenges. New data sources and technologies such as AI offer enormous opportunities, but bias exists in this context too. Deliberate effort is required to ensure women and girls are counted. We need to rethink all aspects of data collection to ensure increased inclusiveness. Unless a feminist approach is adopted through considered efforts, data collection will leave women and girls invisible.

An inclusive data collection system involves the combination of traditional and untapped data sources based on a foundation of deliberate focus on inclusion. Traditional sources require improved survey questions, structures, and inclusive collection practices. Untapped sources can be incorporated to improve traditional sources and create an inclusive data collection system based on feminist principles. However, at each stage of the process, it is critical to always be asking the right questions. These questions are:

  • Who defines the problem that data will solve?

  • Who decides what data to collect?

  • Who collects the data and how?

  • Who analyzes and questions the data?

  • Who uses the data and for what decisions?

We explore the problems that currently exist in traditional data collection methods and then consider new data sources that can contribute to the building of an inclusive system while also addressing the problems they pose.

Addressing problems of traditional data collection

Traditional data collection methods, including censuses, labor force surveys, demographic and health surveys, and others, experience shortcomings that need to be overcome to make a feminist approach to data possible. These sources may contain biases in their design and critical gaps in data produced through lack of inclusiveness. Gender bias inhabits the phrasing of survey questions and how those questions are asked. Data collected at the household level can also overlook intrahousehold inequalities. Marginalized groups are often not recorded, which leaves women in the most vulnerable situations uncounted altogether. These problems of traditional sources and methods need to be addressed to ensure inclusive data collection.

Survey question design

The design of survey questions impacts the data that are collected, creating the potential for perpetuating gender bias. Consider labor force surveys that reinforce traditional gender roles through the structure of their questions. Questions that ask about a person’s economic contributions only ask about primary activity. In the case of women, the answer is often housewife. No further questions are asked, failing to record paid work that could be a secondary occupation. This results in a lack of understanding among policy-makers of how women add value to the economy.1 There is a need to focus on identifying and correcting bias in survey question design at the onset.

Data disaggregation

Data collected at the household level can overlook intrahousehold inequalities. It is crucial to disaggregate data by sex, age, income quintile, and other characteristics to identify inequalities in asset ownership within the household. And to do this, standardized measurement tools should be rethought as disaggregated data are missing from major global datasets.2

Reaching marginalized groups

Marginalized groups are often not recorded, which leaves women in the most vulnerable situations uncounted altogether. Though refugee and conflict-affected populations are most at risk of gender-based violence, conflict situations often impede standard data collection efforts. And where data collection is possible, appropriate methodologies need to be implemented to address ethical concerns and produce useful data.3 Another marginalized group is people who identify as non-binary. Unless non-binary gender categories are incorporated into data collection practices, lived realities and, importantly, gender-based violence against trans-gender women will go unrecorded. It is critical to ensure that intersectional data are made available to highlight the unique challenges facing different marginalized groups.

Expanding the use of untapped data sources

Untapped data sources can create a more inclusive data system than traditional data collection methods alone. A more systematic use of administrative registers can complement traditional sources such as censuses and demographic and household surveys. The widespread use of mobile phones creates opportunities for citizen-generated data to improve information on general living conditions. SMS text surveys can provide rapid assessments of evolving situations. Call detail records and other forms of big data can provide more granular and near real-time information, especially in places where traditional surveys cannot gather information. And the use of artificial intelligence offers significant opportunities for improving processes and reducing costs of analyzing available data.

Administrative data

Administrative data sources such as civil registration and vital statistics systems, enrollment records, health records, and others can provide disaggregated data on health, education, labor, and other critical indicators to validate and close gaps in traditional data sources. To harness the potential of administrative data, countries must implement appropriate legal provisions. As an example, in 2015 Viet Nam adopted a statistical law allowing and regulating statistical use of administrative data.4 Advocacy will be needed to mainstream the use of administrative records. To help decision-makers see their utility, advocates should map existing administrative registers. However, weak statistical systems and a lack of resources present barriers to the use of administrative data.5 Advocacy should also focus on addressing these underlying barriers, calling for adequate financing and strengthening statistical systems.

Many concrete examples demonstrate the potential of expanding the use of untapped data sources. Research from Carnegie Mellon University shows how analyzing administrative data on reported rapes using data mining techniques revealed patterns of sexual violence in El Salvador. Analysis examining two variables using heat map visualizations answered questions corresponding to conditional distribution, such as who the main perpetrators of sexual violence were based on age or location. And a focus on patterns of time and location could help identity where and when sexual violence is increasing. Such analysis could be conducted in real-time, enabling early detection of emerging patterns to inform law enforcement agencies and policy-makers6, although these kinds of analytical techniques are subject to ethical challenges and questions of fairness which are explored further in chapter 3.

Citizen-generated data

Citizen-generated data is made possible through the wide-spread use of mobile phones. Involving citizens in the data collection process is a major component of data feminism and altering the data power distribution. These data also have the potential to increase the granularity of data on a wide-range of development topics and promote greater inclusiveness. While traditional data collection methods may have limited access to traditionally excluded and economically disadvantaged populations, citizen-generated data has the potential to reach these populations through the support of civil society.7 However, while the relevance and potential contributions of citizen-generated data have been recognized, concerns exist about the coverage, comparability, capacity, and sustainability of citizen-generated data as a supplement to traditional data sources.8 These are challenges that must be mitigated, but they do not diminish the great potential of citizen-generated data.

The Maternal Child Health Nutrition Improvement Project9 offers an example of harnessing the benefits of Android phone-based surveys to assess the performance of community performance-based financing to improve maternal child health nutrition outcomes. Due to the inefficiency of paper-based reporting, direct reporting from the community on performance is made possible through smartphones, circumventing time delay, capacity constraints, and data quality challenges associated with traditional data collection methods. Based on the success of this pilot, the Government of Ghana will consider a national scale-up.10

Big data

Big data systems can produce granular, real-time information, offering great potential for shifting data collection towards feminist principles, closing gaps in traditional databases, and providing data where none are available. The untapped sources of big data include social media data, call detail records, radio data, satellite imagery, and others. Data from these sources can help address questions of measurement and validation of traditional.11 sources, correct biases, and provide further disaggregation, all with increased frequency However, legal and technical protocols will need to be developed and implemented with care to address key challenges related to data interpretation and protecting the privacy of citizens.

As an example, the Flowminder Foundation investigated the potential of big data to support the efforts of the Government of Nepal’s gender-equitable development. The project combined geo-tagged survey data from the 2016 Nepal Demographic and Health Survey (NDHS), satellite imagery, and mobile phone data to map gender-related indicators, including literacy, agriculture-based occupations, and births in health facilities. Combining these data allowed researchers to model and map the spatial variations and gender-based inequalities of these indicators, providing a better understanding of the lives of women and girls. However, to realize the full benefits of combining data, governments must increase the sample size of the underlying survey data and build capacity within statistical agencies to use these modelling approaches.12

Artificial intelligence

Artificial intelligence offers potential to improve processes and reduce costs of analyzing available big data - as well as advancing more accurate synthetic data techniques. Here we define Artificial intelligence as any software technology with at least one of the following capabilities: perception—including audio, visual, textual, and tactile—decision-making, prediction, automatic knowledge extraction and pattern recognition from data, interactive communication, and logical reasoning.13 Collecting the right data with methods that ensure the right disaggregation is an important first step, but to create a more inclusive data system, these data must also be analyzed and interpreted using appropriate and efficient methods.

Mitigating challenges of untapped data sources

Utilizing data from traditional and untapped sources offer potential to make data collection systems more inclusive and to help close gaps and overcome bias through validation, but governments and researchers will need to implement the right standards to ensure privacy and overcome biases that remain inherent to the data and research methods. Innovation and technology offer significant opportunities for achieving inclusive data collection practices, but it is not the whole solution. And one solution will not fit all contexts. Each context will have its own unique challenges that will need to be mitigated to be inclusive and to address real-world problems.

Bias in the data and algorithms

While artificial intelligence carries great potential, algorithms based on biased data amplify gender and racial inequalities, proliferating unexamined biases. Big data and data science are overwhelmingly based on white, male, and techno-heroic narratives – a narrative that should be challenged. Challenging the male/female binary system allows for other biased classification systems to be re-examined.14 Not only can biases perpetuate inequalities, they can pose a risk to women’s lives, for example. The design of seatbelts and airbags offers an example of the dangers of designs based on biased data. Because women’s breasts and pregnant bodies were not taken into consideration, women are 47 percent more likely to be injured and 17 percent more likely to die than a man in car accidents.15 Context-specific and gender-specific guidelines need to be established to cover data collection, data handling, and subject-specific trade-offs. And the underlying social narratives of the biases need to be examined through further research.16

To help address issues of bias, it is important to examine the context in which data are generated. Who is on the research team and who is asking the questions are critical issues to consider.17

Data interpretation

The combination of traditional and untapped data sources carries potential, but only the correct interpretation of these data will produce useful insights. It is important to ensure that researchers consider social norms and political realities related to gender equality and the challenges facing women and girls to effectively interpret insights from both traditional and untapped data sources. For example, when interpreting social media data, it is important for researchers to consider that what women are comfortable saying online may not reflect their opinions. Digital threats due to issues of privacy and online harassment and abuse may prevent women from feeling comfortable expressing themselves online.18

The responsible use of big data to fill data gaps and generate critical insights into the lives of women and girl requires respect for individual privacy. While there are many long-standing data privacy principles in place for traditional sources, many privacy issues related to big data have yet to be addressed. The many possible interconnections of big data sources make it difficult to guarantee irreversible de-identification. Even data aggregated to the community level, which minimizes the risk of individual reidentification, may still be used to cause harm to identifiable groups.19 There is a need for multilateral normative frameworks. The United Nations Sustainable Development Group (UNDG) has developed a set of guidelines on data privacy, data protection, and data ethics concerning the use of big data. These guidelines emphasize the importance of the knowledge and proper consent of the individuals on which the data are collected.20 However, governments must ensure that appropriate legal and technical protocols are developed according to the needs of their particular context and implemented to ensure that privacy is protected.

It is critical to ensure that the communities on which data are being collected are involved in the process, and they are provided with the opportunity to provide informed consent for the use of their data. Data collection methods need to include transparency and education about the risks and benefits of data use and non-use. Researchers should conduct risk assessments in given contexts and ensure that consent provided is well-informed.21

Scaling up for impact

Many projects have demonstrated the potential of using big data to close gaps and provide data where none are available. However, most of these examples are one-time research projects or pilot projects. For the full benefits of these untapped data sources to be realized, projects must be scaled-up and see wide-spread implementation. Governments need to take an active role in harnessing the potential of these untapped data sources and adopting feminist approaches to data collection. In many contexts, governments will need to invest in building the capacity of technical staff to be able to harness the benefits of innovative technologies and untapped data sources. Delivering policies that leave no one behind will require greater investment in data.

Private-public sector partnerships offer further opportunities through harnessing private-sector big data for public good. Establishing data stewardship as a function within the private sector, the practice of data collaboratives between government and businesses can become more predictable and sustainable.22


Changing collection data practices will not be a simple or easy process. Current inequalities must be examined, challenged, and changed. It will require a collective shift in data collection methods and a challenge to existing power structures, but are fundamental to realizing more inclusive data systems. The right legal and technical protocols to mitigate the challenges of both traditional and untapped data sources will be essential. But only through concerted effort and dedication to inclusion, feminist data collection can be implemented to ensure that no one is left behind.


Buvinic, Mayra and Levine, Ruth. 2016. Closing the gender data gap. Royal Statistical Society.

CIVICUS. n.d. DataShift: Building the capacity and confidence of civil society organizations to produce and use citizen-generated data.

Collett, Clementine and Dillon, Sarah. 2019. AI and Gender: Four Proposals for Future Research. Cambridge: The Leverhulme Centre for the Future of Intelligence.

De-Arteaga, Maria, and Artur Dubrawski. 2017 ‘Discovery of Complex Anomalous Patterns of Sexual Violence in El Salvador’. ArXiv:1711.06538 [Stat],

D’Ignazio, Catherine and Klein, Lauren F. 2020. Data Feminism.

Goudsmit, Jeroen and Midgley, Linda. 2019. “Measuring impact on the SDGs through AI.” The PwC Blog.

GovLab. n.d. Data Stewards.

Lopes, Claudia Abreau and Bailur, Savita. 2018. Gender Equality and Big Data. UN Women.

Neuromation. 2018. “AI and Deep Learning Are the Keys to Unlocking the Future Of Environmental Sustainability.”

OECD. 2018. Development Co-operation Report 2018: Joining Forces to Leave No One Behind.;jsessionid=V6jLduN_EDE3e7-bNmQZiD9H.ip-10-240-5-164

Perez, Caroline Criado. 2019. Invisible Women.

The Flowminder Foundation. 2019. “Towards high-resolution sex-disaggregated dynamic mapping.”

The Global Women’s Institute. 2017. Gender-Based Violence Research, Monitoring, and Evaluation with Refugee and Conflict-Affected Populations: A Manual and Toolkit for Researchers and Practitioners. George Washington University.

UNSDG. ‘Data Privacy, Ethics and Protection: Guidance Note on Big Data for Achievement of the 2030 Agenda’. Accessed 24 March 2021.

Vietnam National Assembly. 2015. Law on Statistics. Hanoi.

Vinuesa, R., Azizpour, H., Leite, I. et al. 2020. “The role of artificial intelligence in achieving the Sustainable Development Goals.” Nature Communications 11, 233 (2020).

World Bank., “Ghana - Maternal, Child Health and Nutrition Project.” Project ID P145792 (2014).

World Economic Forum. 2018. “Assessing Gender Gaps in Artificial Intelligence.” Global Gender Gap Report 2018.

No comments here
Why not start the discussion?