Skip to main content
SearchLoginLogin or Signup

Design of Data Science Projects for Inclusive Public Policies

We update here the section of the book entitled "Incubating Feminist AI", which originally included the summary of the first three projects selected in the homonymous call for proposals and now offers the final versions of the seven LAC projects incubated so far.

Published onJul 10, 2023
Design of Data Science Projects for Inclusive Public Policies
·

Abstract

The purpose of our study is to reformulate and validate changes to a design methodology for data science projects for public officials based on collective contributions. The analysis integrates approaches from the field of data justice, intersectionality and public design justice. Promoting the critical implementation of data science in the public arena based on awareness of structural changes in public decision-making is a non-technical instrument and an alternative technology for rethinking AI and inclusive data science. Efforts to bridge the existing data gaps, reconcile public innovation agendas and feminist AI, and raise awareness of new rights and knowledge matrices are elements taken into account to review the design of public data-intensive projects.

The research process contributes inputs from two workshops with public officials and activists of various fields in the region in which changes to a project design datasheet used in Chile and published in an IDB guide for Latin America are explored and validated. As a result, the reformulation of the project datasheet is presented based on a first internal review, collective contributions from online meetings and a resulting analysis with the proposed exploratory dimensions. The main changes are defined by the need to have participatory instruments at the different design stages, a project and data governance strategy, and a cross-cutting iterative feminist data justice vision.

Introduction

In the context of the status of data governance and Artificial Intelligence (AI) policies, Latin America offers an extremely pertinent scenario for exploring the current practices carried out by the State in its efforts to include the field of data science in public policy design. In this scenario, international law sources, public innovation policies and others emerging from feminist data activism converge in debates and various actions. International projects under the Roadmap for Digital Cooperation (NNUU, 2020),1 open government policy developments within digital transformation or gender focuses (AGA, 2021),2 civil society research (EmpatIA,3 Derechos Digitales,4 Red Fair5) and recommendations emerged from the most recent work conducted by the Global Data Barometer6 contribute evidence of the need to deepen the regional feminist AI agenda.

  1. Naciones Unidas, Asamblea General “Hoja de ruta para la cooperación digital: aplicación de las recomendaciones del Panel de Alto Nivel sobre la Cooperación Digital” Informe del Secretario General, A/74/821, (May 29, 2020) Available at https://documents-dds-ny.un.org/doc/UNDOC/GEN/N20/102/54/PDF/N2010254.pdf?OpenElement

  2. Open Government Partnership (2021) Digital Governance Fact Sheet https://www.opengovpartnership.org/wp- content/uploads/2021/11/Digital-Governance-Fact-Sheet.pdf

  3. Iniciativa Inteligencia Artificial para el desarrollo en América Latina https://www.empatia.la/

  4. Proyecto Inteligencia Artificial e Inclusión en América Latina https://ia.derechosdigitales.org/

  5. Red Fair LAC (Nov 1, 2022), INTELIGENCIA ARTIFICIAL FEMINISTA Hacia una Agenda de Investigación para América Latina y El Caribe, Available at https://archive.org/details/inteligencia-artificial-feminista/page/n1/mode/1up

  6. Barómetro Global de Datos (2022). Primera Edición Informe – Barómetro Global de Datos. ILDA. DOI: https://doi.org/10.5281/zenodo.6488349

The particularity of the Latin American feminist AI agenda requires interdisciplinary, multi-level and multi-actoral work consistent with data justice approaches, fair design and inclusive data science. Discursive tensions regarding what data justice is (Dencik, 20227; Draude, 20228; GPAI, 20229; Hintz et al, 202210) are reflected in the design of data science projects in the public sector through struggles for rights, knowledge, resources and ethical agendas.

In turn, these tensions are deepened by fragmented and dispersed routes in localizing regulatory frameworks or actions at four levels: a) collection and use of data (statistical, open); b) public policy design based on bulk data or automated decision-making; c) knowledge and practices in the field of data science, AI or big data; and d) underlying governance and participation approaches. The struggles and intersections among these levels offer various scenarios in which discussing the meaning of inclusion, inequality and justice in view of intersectional approaches in the public sphere is crucial (Klein & D’ignacio, 202111). Exploring the structural inequalities behind a project’s design practices entails a shift to the macro level of a public policy’s conception, implementation and assessment. Consequently, our objective is to inquire into how a "Data Science for Public Managers" model and methodology currently used by the IDB in its manual for responsible AI project formulation12 allows exploring the meaning of inclusion by validating new processes for Latin American feminist data science.

The joint idea of exploring changes in the resources derived from the “Data Science for Public Managers” initiative of GobLab13 emerges from studies and interviews for research on public innovation, feminism and data science in order to suggest a future imaginary14 regarding data science for inclusive public policies. This imaginary is presented based on a methodology that integrates review proposals for the resources in question, validation with experts in the region, and the analysis and proposal of a new resource based on three new dimensions.

First of all, the study presents three main analysis dimensions and examines global approaches to the use of data and bulk data15 as a general context instrument. Data justice, participatory design and an intersectional approach result in a matrix of changes to rethink the design process of a data science project for public policy that is replicable in the region. We based the proposal on the review search of an active resource already tried by various officials in Chile and México in the public sphere in order to modify it based on emerging and necessary approaches so it can be replicated and reused again. This is also for the purpose of expanding the collectives potentially interested in its use, promoting awareness about the challenges of implementing data

  1. Dencik, L., & Sanchez-Monedero, J. (2022). Justicia de datos. Revista Latinoamericana de Economía y Sociedad Digital. 8 Draude, C., Hornung, G., & Klumbytė, G. (2022). Mapping Data Justice as a Multidimensional Concept Through Feminist and Legal Perspectives. In New Perspectives in Critical Data Studies (pp. 187-216). Palgrave Macmillan, Cham.

  1. GPAI (2022). Data Justice: Data Justice in Practice: A Guide for Policymakers, Report, November 2022, Global Partnership on AI

  2. Hintz el al (Jun 2022) Civic Participation in the datafied society: towards democratic auditing? Available at

:https://datajusticelab.org/wp-content/uploads/2022/08/CivicParticipation_DataJusticeLab_Report2022.pdf

  1. Catherine D'Ignazio; Lauren F. Klein, "Introduction: Why Data Science Needs Feminism," in Data Feminism, MIT Press, 2020, pp.1-19.

  2. Denis, G., Hermosilla, M., Aracena, C., Sánchez Avalos R., González Alarcón N., Pombo, C. (2021) Uso responsable

de IA para política pública: Manual de formulación de proyectos Retrieved from https://publications.iadb.org/es/uso- responsable-de-ia-para-politica-publica-manual-de-formulacion-de-proyectos

  1. Created in 2019 by GobLab UAI in partnership with the Center for Data Science and Public Policy of the University of Chicago. A project datasheet and a data maturity matrix tested with 80 people in leadership positions in the Chilean government emerged from it. Subsequently, the methodology continued to be used and iterated in Chile in 2020 and 2021, and new methodological supports such as a glossary and four project datasheets were created. Finally, in 2021,

the IDB included the methodology in its responsible artificial intelligence project formulation manual (see: https://publications.iadb.org/es/uso-responsable-de-ia-para-politica-publica-manual-de-formulacion-de-proyectos

  1. Imaginary proposed as a dynamic at the workshops.

  2. Bulk data is the term most frequently used in studies on public innovation and related studies. However, throughout our research, we find the word macrodata used as a synonym in international regulations.

science projects without a fair and inclusive perspective of rights, practices, or participation or auditing processes.

Then, the various modifications to be made at the design stages of a data science project are presented based on the input obtained from three online workshops held in October and November 2022, considering the objective of the study.

Finally, a new resource is proposed that emphasizes the cross-cutting nature of the premises emerged from the data justice approach and participatory design practices throughout the project’s cycle. Stages inherent in the suggestions and potentialities of feminist data activism are added, and the need to offer a project governance strategy that includes data governance in particular is incorporated.

Analyzed inputs that may be useful for a future piloting stage were omitted in this document, such as: methodological proposals for identifying appropriate participatory processes with inclusive approaches, as well as the strengthening of the data governance strategy according to modalities inherent in local public processes.

Problem Statement

Data science for public policy project designs in the region often have shortcomings as a result of multiple issues. Innovation policies disconnected from local agendas, lack of funding and training, absence of regulatory frameworks or data governance, and the persistence of silos, among other variables, influence the upkeep of inequality or basic asymmetries that a data science project then reproduces. Based on a review of the current project datasheet, a specific action proposal to incorporate dimensions of localized data justice, participatory designs and intersectional approaches into the design of public data science projects is explored.

Methodology

The study’s methodology is hybrid. It is based on the analysis of documents and projects related to three key focuses or dimensions: the discursive evolution of bulk data, AI and data science at a global intergovernmental level (particularly in the United Nations), data justice and participatory design in the public arena, and feminist data science.

These three focuses were used to review the data science project design and data maturity datasheet. Then, by means of a qualitative process conducted with three (3) researchers and educators from GobLab UAI, potential changes were first explored. From there, various dynamics were devised that were presented to people working in open government, data science, feminist activism and public policy in the region. These dynamics were captured in several Jamboard frames in order to hold two online workshops on November 14th and 15th, 2022.16

The invitations to various leading experts in the region were made from a list of organizations, researchers and selected activists, and with the help of the study’s collaborators: the Latin American Open Data Initiative (ILDA, Iniciativa Latinoamericana por los Datos Abiertos), CIETCI and Abriendo Datos Costa Rica. Twelve (12) participants contributed their opinions about the current and future situation of data science for the public sector, the regional data scenario, the challenges of team formation and a future imaginary for project design. The input was analyzed based on the study’s objectives and the review of the Project Datasheet (Appendix 3). Finally, based on the initial reform proposal for the Datasheet, the input from the workshops and a detailed

  1. Prior to the implementation of the two workshops with experts external to the project, a brief descriptive document about the project with useful reading resources was prepared.

documentary analysis, a new Project Design Datasheet was prepared. The Datasheet has also been graphed for methodological and reading purposes.

  1. Elements and Dimensions for a Public and Inclusive Data Science

Data science emerges as a discipline (Rodriguez, 201717), as various practices for data management and analysis, but also as a public management competence or skill that needs to be discussed in this section in order to delve into and outline the analysis dimensions of the workshops and future design action dimensions. Therefore, it is relevant to characterize the journey of big data and data science in international policy frameworks as a necessary context variable, the course of the transformations in public administration inspired by such agendas and, finally, the relationship of data science with diversity, inclusion and gender approaches.

These elements allow us to contextualize the relevance of the dimensions included in the project’s internal analysis regarding the design stages of a data science project and, subsequently, the resulting contributions from the workshops with leading experts in the region in topics related to public policy, gender and data.

International Data Frameworks: Statistical, Open and Hybrid

First, we retrieved milestones consistent with the progression of bulk data in the international system represented by agencies and organizations of the United Nations as an instrument that allows positioning the discourses and practices of data science in public policy. Although technological developments have had a predominant role in data discussions in the United Nations, the conflicting narratives will offer an enabling ground for state changes. This is a first interpellation for which we incorporated this contextual data overview into the study.

Two decades ago, the demand for greater efficiency in the organization’s coordination results and the scarcity and lack of systematization of timely data to respond to the biggest global challenges (disasters, climate change, development) promoted the data revolution narrative. In parallel, public administrations started to promote open government policies. Coupled with movements such as free software, this caused an intensification of highly relevant discussions in bodies such as the United Nations General Assembly, programs of the Secretariat or specialized agencies of the organization regarding automated data techniques and practices.

According to an interview with the director of UN GLOBAL PULSE18 in 2014, data science is important for accelerating innovation and limiting the obstacles for development policies and humanitarian actions. That same year, the Independent Expert Advisory Group on a Data Revolution for Sustainable Development19 called for a data revolution in order to have more evidence for decision-making in view of the lack of data collected for the millennium goals.

Since 2018, thanks to the Bogota Declaration (2018),20 data science started to be mentioned in training sections, but also on the websites of the expert groups in question. The discussion inside

17 Rodriguez, P et al (Jul 2017) El uso de datos masivos y sus técnicas analíticas para el diseño e implementación de políticas públicas en Latinoamérica y el Caribe. Resumen de Políticas N IDB-PB-266. BID. https://publications.iadb.org/publications/spanish/viewer/El-uso-de-datos-masivos-y-sus-t%C3%A9cnicas- anal%C3%ADticas-para-el-dise%C3%B1o-e-implementaci%C3%B3n-de-pol%C3%ADticas-p%C3%BAblicas-en- Latinoam%C3%A9rica-y-el-Caribe.pdf

  1. UN Global Pulse was created in 2009 at the request of former Secretary General Ban Ki Moon. The mission of Global Pulse is to promote innovation to give better global responses with the contribution of bulk data (structured and unstructured). The first projects were developed under the denomination “data driven emergencies”, in which digital fingerprints from cellphones and geospatial data were used to identify food safety patterns, population movements,

droughts, etc. See official website of UN Global Pulse https://www.unglobalpulse.org and quoted interview at https://www.jstor.org/stable/27000934

  1. See https://www.undatarevolution.org/

  2. What was said may be elaborated on based on the reading of the Bogota Declaration emerged from the Fourth Global Conference on Big Data for Official Statistics, available at https://unstats.un.org/unsd/bigdata/conferences/2017/Bogota%20declaration%20-%20Final%20version.pdf

the walls of the UN is such that today the terms of data science and big data21 have been added to the group’s traditional name to channel the necessary developments for official statistics and adhere to global narratives.

A second interpellation from the overview of the global UN structure is provided by demands regarding the nature and sources of data. Source diversity not only comes from the type of data

-whether it is structured or not, or from its technical-technological connotation. It is also related to the actors that give rise to it: government or public, private, civil society, academia.

Based on the document “A World that Counts”,22 the need to involve various actors in data collection due to the data’s inadequacy, absence and lack of opportunity in monitoring the Millennium Goals (later Sustainable Development Goals) was claimed. Over time, the need to have greater and hybrid23 sources of evidence for decision-making is reinforced both by the evolution of data opening and activism practices from the civil society and, to a lesser degree, from universities’ open research data.

Paris21 (2020) proposes a quality framework to promote this data alternative in case of a lack of official data. In this case, quality is understood according to criteria of relevance, accuracy, credibility (trusted data), opportunity, access, interpretation and coherence. The search for standardization, interpretation and the disaggregation variety will lend governance frameworks or the design of data science and AI projects other inputs. On the other hand, the search for various sources is an indispensable element in open government or public innovation policies (Brussa in Red Fair Lac, 2022).24 In fact, the Global Data Barometer published in 2022 refers to new formats that lead to better collaboration for data co-production, its publication or automatic harvesting in national or local open data portals. Its methodology also emphasizes the relationship between sources and gender.

Data source diversity is not only a matter of quality attributes, but also of equity. The study conducted by the Global Partnership on AI25 includes the pillar of equity in AI projects based on actions that consider the populations involved in data collection processes. Therefore, the types of data and their origin are elements that impact the design of data science projects. This based on reviews on aspects related to data quality, integration of interpretability criteria, transparency modalities, team formation and reflections on the data cycle.

A third interpellation is distinguishing the agendas, spaces and organizations that current include data science in their regulatory, documentary and prospective work. The shift and acceleration of bulk data use for the development agenda -2030 Agenda- is increasingly greater and will have various impacts on the dimensions analyzed in the study regarding the nature of data for public policy design, the governance of this data, the advancement of AI and its adjustment at a global level, but also in terms of gender. Data science, team formation and skills obey the same public demands, but they are often seen as separate realms of action, implementation or assessment.

  1. The pertinent changes regarding narrative shifts at the UN can be accessed on the following platform https://unstats.un.org/bigdata/un-global-platform.cshtml

  2. A datacentric narrative becomes evident from the title and content of the document that is important to reference to

account for the progress and setbacks that will be vital for the feminist data activism and the assessment of the context of global policies on the subject analyzed in the study. The document in English is available at https://www.undatarevolution.org/wp-content/uploads/2014/11/A-World-That-Counts.pdf

  1. The relevant documentation about the search for hybridness in official data sources and the use of data generated by the civil society can be found in the Bern Data Compact. Document available at https://unstats.un.org/sdgs/hlg/Bern-Data- Compact/

  2. Brussa, V (2022) En agenda: aportes de gobernanza y ciencia de datos para una lA feminista y latinoamericana. In

INTELIGENCIA ARTIFICIAL FEMINISTA Hacia una Agenda de Investigación para América Latina y El Caribe, Available at https://archive.org/details/inteligencia-artificial-feminista/page/n1/mode/1up

  1. GPAI (2022). Data Justice: Data Justice in Practice: A Guide for Policymakers, Report, November 2022, Global Partnership on AI.

The action plans of the Open Government Partnership26 or indexes such as the Global Data Barometer do not provide specific keys on the use of data science for public policy. The latter only mentions the use of data science in the MENA region as a finding or as an indirect allusion under digital governance27 in relation to algorithmic transparency projects.

The situation is different when documents, training courses or events from some public innovation labs (Brussa, forthcoming)28 are considered, in which data science is mentioned as an emerging and necessary field for designing new public policies. Training aspects are mentioned here, but there is still a long way to explore regarding the use of techniques, and the implementation or design of projects in the area of public policy. On the other hand, an alignment between openness policies and the references made at an international and national level to the transformations of statistical offices is not clearly evidenced.

Consequently, data science is a matter of experimentation policies in the context of public innovation and related labs mainly in Chile (GobLab UAI) or Colombia (EIP),29 or with mixed characteristics in Argentina or Uruguay based on creations of data science or AI centers.30

A final interpellation refers to thematic agendas, in which gender, diversity and inclusion are key. Themes relative to disasters or emergencies have internationally triggered the structure, organization charts and declarations on big data, data science and, most recently, AI to a great extent, as we have previously examined. It is there that a gender perspective or data feminist approaches have also driven pertinent changes to identify existing data gaps, and open discussions about rights (privacy, ethics, justice) and the characteristics of data beyond disaggregation. At the international United Nations level, the evolution regarding the topic has been that of a pendulum movement between the development and humanitarian pillars. Gender issues were first a cross-cutting topic and then a specific issue for the organization in terms of data.313233

In summary, the journey through the four interpellations of context at a global level -especially about the UN34- provides an indispensable starting point for a necessary inflection in the design of inclusive public policies and data science. Therefore, a) the evolution of international frameworks and structures that need to be localized show that we have a great number of regulations, yet they are fragmented and scattered or lack trust in their implementation (Global Digital Compact, 202035; Our Common Agenda, 202136); b) the recognition of advancements in

  1. Website https://www.opengovpartnership.org/

  2. OGP (2021) Fact Sheet Digital Governance https://www.opengovpartnership.org/wp-content/uploads/2021/11/Digital- Governance-Fact-Sheet.pdf

  3. Brussa, V (forthcoming). Data Feminismxs de Laboratorio: Brechas y Tensiones en las Agendas de Innovación y de

Datos Iberoamericanas. Revista Feminismos. Universidad de Alicante.

  1. See https://eipdnp.medium.com/innovaci%C3%B3n-p%C3%BAblica-con-ciencia-de-datos-673cbbdc39d6

  2. In Argentina, organizations such as Fundar, CIETCI and Fundación Sadosky promote projects in accordance with data science based on various action levels and theoretical-practical conceptions. In Uruguay, KHIPU is a relevant example of progress in this issue.

  3. The intersection among gender, diversity and women in relation to data is often subsidiary in the mission of UN Women.

One of its actions entitled Research and Data refers to its activities in gender indicators collaboration. https://www.unwomen.org/es/how-we-work/research-and-data

32See https://www.unescap.org/sites/default/d8files/event-

documents/The_Inclusive_Data_Charter_Global_Partnership_for_SDGs_Stats_Cafe_13Jan21.pdf 33 Chisaka, T, (2021), Five tips to promote data inclusivity [blog] , Data4SDGS https://www.data4sdgs.org/index.php/blog/five-tips-promote-data-inclusivity

34 The UN selected as an exemplary space for intergovernmental context debate. Another issue will be analyzing it in terms of its data representative or narrative level.

35Office of the Secretary-General's Envoy on Technology (2020). Global Digital Compact. https://www.un.org/techenvoy/global-digital-compact

36NNUU (2021) OUR Common Agenda, Reporte del Secretario General de Naciones Unidas. https://www.un.org/en/content/common-agenda-report/assets/pdf/Common_Agenda_Report_English.pdf

technology, technique and the widespread use of data, as well as new fields such as data science have not prevented the absence of or gaps in the necessary data for greater inclusion; c) the existence of thematic agendas with fragmented data or work in silos continues to limit joint action in the face of the new challenges of AI or gender and data science; d) the demand for data source diversity is ultimately a reality. Each item questions the cycle of data science project design in terms of actions regarding data justice, fair design, and the characteristics of data use models and practices.

Data Feminism and Data Science

The demands and challenges at a global data level impact the need to examine the field of data science from structural transformation approaches and practices. Feminist data science is one of them, and it is indispensable for designing public policies that hold inclusion values, principles or public action.

The purpose of data feminism practices is to expose inequities, inequalities and injustices based on data science to transform and change the dimensions of power. The intervention is throughout the project’s cycle, throughout the data’s cycle and, in this case, during the design cycle of data science projects for public policy in case they wish to be called inclusive. As a methodological exercise, there is a meso and a micro level of local governance (Brussa, 2022)37 because how we design a data-based public policy and the criteria, principles and used data science practices are interrelational. A feminist data approach and intersectional thinking must expose, make visible and influence how we think of and implement project design thinking. As a result, the design stages of both the project and the policy will be affected. How we position ourselves before the problem, determine what databases, models or techniques are included, and widely discuss not only the quality of data but also the thematic domain based on expanded knowledge are intrinsic elements of a different data science, possible and inclusive.

Taking the principles of Klein & D’Ignacio (2020) as a first layer of distinctive elements for an inclusive data science, we chart the course for the analysis of the workshop contributions that will modify the Data Science Project Design Datasheet (GobLab UAI, 2022).38 First of all, data feminism is not only about women but about power, inequity and intersectionality. Second, examining design thinking from the perspective of action. Finally, thinking of action -and therefore, design- in an iterative manner.

At the micro level of inclusive design thinking, it is also necessary to consider alternative proposals in case models are used (Eitzer, 2022)39, identify participatory practices and methodologies for the problem prioritization stages, and even more, establish data (or project) governance a priori (Mozilla, 2020).40 This final aspect is something that must be resolved both at the meso level of public policies and the micro level of data science projects, as it is practically a non-existent exercise.

Public Policy Design and Data Science

At a Latin American level, and based on research conducted by CAF (2021),41 Argentina, Brazil, Chile, Colombia, Mexico, Peru and Uruguay have developed or are developing national strategies

  1. Slide presentation Local Governance Talk of FAIR network (Spanish version) https://zenodo.org/record/7600135#.Y9veUnbMLIU

  2. The Datasheet may be downloaded from https://goblab.uai.cl/proyecto-curso-ciencia-de-datos-para-directivos- publicos/

39Eitzel, M (2022) Writing a “modeler’s Manifesto” for More Transparent, Ethical Data Science https://datasciencebydesign.org/blog/writing-a-modelers-manifesto-for-more-transparent-ethical-data-science

  1. Data Future Lab (2020), Who is Innovating? | Global Landscape Scan and Analysis of Initiatives . Available at https://foundation.mozilla.org/en/data-futures-lab/data-for-empowerment/shifting-power-through-data-governance/

  2. CAF. (2021). Experiencia: Datos e Inteligencia Artificial en el sector público. Caracas: CAF. Retrieved from http://scioteca.caf.com/handle/123456789/1793

for artificial intelligence and data. These are generally characterized by including ethical aspects, in which the mitigation of biases, non-discrimination and the prevention of damages acquire particular relevance.

Beyond the regulations or strategies promoted by Latin American governments in recent years, there is an ecosystem of relevant actors and catalysts for data science and artificial intelligence projects in the public sector in Latin America. An example of this is the fAIr LAC initiative for responsible artificial intelligence of the IDB Group, which is composed of a public-private network consisting of universities, companies, civil society and state organizations whose purpose is the development and use of Responsible Artificial Intelligence. To date, the initiative has funded over 600 projects mainly related to the reform and modernization of the State.42 In collaboration with GobLab UAI, it has published reference frameworks about ethical data management and a manual for formulating artificial intelligence projects for public policy.

Another important effort in Latin America, which is supported by the IDB’s fAIr LAC, is the aforementioned EmpatIA project. Developed by ILDA and the Latam Digital Center, its purpose is to understand and stimulate the development of artificial intelligence use policies and training from an ethical and intersectional perspective. An example of this is the project “Inteligencia Artificial para predecir el nivel de la calidad del aire en Chile” (Artificial Intelligence to Predict the Air Quality Level in Chile), jointly carried out by Chile’s Superintendence of the Environment and GobLab of Universidad Adolfo Ibáñez for the purpose of implementing a tool that can predict the air quality level in an industrial area. The projects of the Latin American Open Data Initiative (ILDA)43 are along the same line, but with a particular emphasis on diversity, inclusion and ethics. The research on the standardization of gender data is of particular interest for the transformations promoted in the region’s public administrations. The exercise requires skills and public policy strategies derived from best practices in the recording of official data. It is in this area that global interpellations debate on the regional work regarding progress in data attributes such as quality and equity.

42See https://fairlac.iadb.org/

  1. ILDA is an organization focused on building relationships, knowledge and tools to support data-based inclusive development in Latin America based on 4 strategic focuses: Community, Gender and Inclusion, Emerging Technologies, and Transparency and Governance. In particular, the gender and inclusion focus is worked on by ILDA to make groups or people who are not included in terms of data and/or algorithms visible.

    1. Data Justice, AI and Inclusive Data Science

FIG 1. Shifts of Data Science based on Data Justice

Source: Print Screen Slides Brussa, V (2022) in MESA “Gobernanza y políticas públicas en ciencia de datos para una IA feminista en América Latina”. Red FAIR

The concept-action of data justice implies a shift in practice in the field of data science by delving into intersectional thinking and the risks of datafication. This first shift has an emphasis on a data justice approach rather than on ethical frameworks as a transformative premise for public policies. Without a substantial analysis of the who, how and why of data, basic inequities could be reproduced.

As detailed in the first section, data justice deserves to be considered when carrying out a project design, as suggested by the study entitled “Advancing Data Justice Research and Practice Project” of the Data Governance Working Group of the Global Partnership on Artificial Intelligence (GPAI). In this document, the data justice action pillars can be adapted to our region considering the limitations, challenges and opportunities related to the field of data science in the public sector. The principles of data feminism become explicit in the proposal of six pillars for the relocation of data justice. In particular, the Power pillar refers to data science and artificial intelligence tools as epistemological techniques with regulatory impacts on the subjects (GPAI, 2022). Consequently, in the pillar about knowledge, it is stated that data science and innovation must have a public commitment to ensure social wellbeing (GPAI, 2022).

It is necessary to clarify that the data justice approach presented by the document stresses its usefulness in the field of innovation policies and therefore deserves to be discussed in the particular context of the project developed here in view of the shortage of data science policies from public innovation areas, the lack of communication among innovation movements (open government, digital, open data, gender) or the insufficient government capacity to strengthen data literacy processes with an interrelational and intersectional perspective.

Latin American Paths for Data Science: Action Dimensions

In this section, the vicissitudes offered by the international plane regarding data, the incentives contributed by data feminism and data justice are considered three specific analysis dimensions. The DATASHEETS for designing inclusive data science projects for public policy are discussed

based on them. Three workshops have been held to inquire into the main contributions from the principles of data feminism, fair design and data justice on one side, and the necessary localizations of global proposals on the other, that could be materialized as challenges or triggers for the concrete transformation of a Datasheet created by GobLab UAI.

Three dimensions to be validated with the collective contributions emerge from the internal work conducted with researchers from GobLab UAI in which a scheme of substantial, subsidiary and narrative44 changes was proposed.

  1. Examining the experiences, paths and statuses about what is understood by data science project design.

  2. Exploring the data scenario through a discussion of variables such as access, governance, capabilities and characteristics.

  3. Characteristics of the teams that devise, develop and implement data science projects.

A fourth -cross-cutting- dimension was added that constitutes the collective identification of an imaginary regarding fair design in data science.

It is worth highlighting that at this first stage of the project, the objective of the study45 is to propose changes to a Project Design Datasheet in order to facilitate a resource under the imprint of other approaches. A primitive question about “what would happen if we added a gender perspective to the Datasheet?” during an interview was the trigger for exploring changes, contributions and future actions. Then, the intersectional, data justice and fair design perspectives were integrated.

The following paragraphs summarize the contributions emerged collectively from the workshops and propose the changes worth implementing in the Project Design Datasheet. Changes also derived from the analysis of the study’s authors may be added in due course in order to ultimately provide future action guidelines.

Data Science Project Design for Inclusive Public Policies FIG 2. Original Datasheet
  1. See matrix of changes proposed and discussed in the first workshop. Methodological Appendix 1.

  2. The dimensions are based on discussions that could be part of a prototyping and piloting stage in which to delve into:

    • Quality of data within the context of the analyzed principles and pillars

    • Methodologies and moments of project co-production/co-design

    • Guidelines for adopting a data governance strategy when implementing a data science project

This datasheet was first reviewed based on proposals from the Program for Cooperation and Research in Technologies, Data and Society (Codatecs, Programa de Cooperación e Investigación en Tecnologías, Datos y Sociedad), and then internally by GobLab UAI researchers who discussed the proposed transformations in two workshops. The key dimensions that became methodologies for exploring the current situation of data science projects, issues regarding the data to be used in the projects, the formation of teams that develop, analyze and implement the projects and, finally, opinions about what an inclusive data science project should look like with actors of the region emerged from these workshops.

Consequently, the purpose of the workshops was to collectively validate the findings and transformation proposals for the Datasheet required for a revision that contains the approaches of data justice, fair design, and the new public policy designs with a data science and feminist data activism perspective.

Fig. 2 graphically illustrates the Datasheet46 based on the stages for designing a data science project for public policy.

Some points that are important to highlight are:

  • The Datasheet places an emphasis on project stages and not strictly on the data lifecycle. The latter is found in a datasheet entitled Data Maturity Matrix, which was also reviewed for potential changes with a data justice and data feminist perspective.

  • The Design Datasheet is meant to be used by government officials, particularly at a national level. Therefore, the stages are thought in relation to a project for designing public policies, its issues and actors.

  • The Datasheet was conceived for the Chilean context based on a datasheet designed in the US and used by the Data Science for Social Good Fellowship47 in various parts of the world. It is currently being used in Latin America thanks to the IDB’s guide, and in consulting and teaching projects in other countries like Mexico.

  1. Remember that the reviewed Design DATASHEET as well as the data maturity one are the ones published in the original Project Design Datasheet designed by GobLab UAI, available at https://goblab.uai.cl/proyecto-curso-ciencia-de- datos-para-directivos-publicos/

  2. See https://www.dssgfellowship.org/

As described in the methodological section, the workshops were designed to offer a dialogue, the first with public administration leaders (GROUP1_State) and the second with feminist data activism leaders, data experts and civil society organizations (GROUP2_Activism).

The need to review the Project Design Datasheet will be subsequently analyzed based on the contributions of GROUP1_State and GROUP2_Activism, considering the study’s objective: How to think about the design of data science projects for inclusive public policies?

As a resource for the visual reading of the analysis, we use four icons to identify each dimension. The contributions and potential transformations for project design based on the study’s concepts, its objective and collective ideas will be described where the icons are presented.

  1. Findings and Results for Inclusive Data Science Action

These figures -visualization of datasheet- are under construction. It is a preliminary and open re-design of the datasheet.

First Dimension: How Are We Doing in the Region?

FIG 3. Reviewed Project Datasheet –Data Science & AI Projects

The main challenge highlighted by the workshop participants is the absence of human, infrastructure, organizational, regulatory and data literacy resources with a gender perspective and the lack of political will to channel the potentialities of data science in public management with a long-term perspective. This limitation affects the definition of the problems that can be analyzed by the new data practices or techniques. Manifestations prone to monopolizing the design of processes centered on problem resolution are gathered from the contributions (Nesta, 201748), as well as data epochalism (Gpai, 2021) or data imaginaries centered on big data and digital technologies (Hintz et al, 2022) in scenarios for designing public innovation policies based on data or data science. However, the demand for greater data governance regulations, the advancement of data activism’s commitment and the acknowledgment of the lack of resources in public administration change the perspective of the problem according to the participants’ ideas. Deepening the assessment of the impacts of using data science, AI or other data techniques in public policy is an indicator of the necessary review of each stage of the project with a perspective of equity and data justice from the beginning.

Exploring how the structure of the data influences the definition of the problem (Klein & D’ Ignacio, 2020; Guerra, 2021; 49Gpai, 2021) entails starting from an equity framework rather than an efficiency one because “Normalising power manifests in the way that the ensemble of dominant knowledge structures, scientifically authoritative institutions, administrative techniques, and regulatory decisions work in tandem to maintain and ‘make normal’ the status quo of power relations” (GPAI, pp 29).

A review of how the problem is defined also entails paying attention to those who participate in it. Here, it is important to highlight the importance of convening actors or public spaces that have participated in the data collection process to prevent greater inequities or gaps consistent with a lack of knowledge of its traceability, sources or limitations. In addition to this change, it is vital to include voices that allow delving into aspects related to the affected populations in order to evaluate the need to incorporate other data sources, other participation practices and other actors into the team.

All of this results in starting from a definition of the problem in an interdisciplinary, multi-actoral and equitable manner. The question “Why is solving this problem a priority for your organization?”

  1. See https://media.nesta.org.uk/documents/solved-making-case-collaborative-problem-solving.pdf

  2. Guerra, J (Aug 2022) Towards a feminist framework for AI development: from principles to practice.

https://aplusalliance.org/frs-first-cohort-of-feminist-ai-papers-have-arrived/

in the original datasheet must therefore consider these aspects. Determining the problem that public policy is trying to solve using data science must necessarily be focused on premises broader than the efficiency demanded by public agendas centered on automated or data-based decision-making. The collective analysis of the contributions from the workshops produces the following specific review, i.e. change proposal for the Datasheet.

A second element to be analyzed based on the contributions about the status of data science projects affects the “Actions” stage”. It is important to recognize that “data science can address new public agenda challenges and provide new perspectives for a social problem based on disaggregated data” and the need for “greater identification of the necessary data for the development of agreed public policies” (Appendix 3).

Therefore, answering the questions “How will the action be modified with the data science project?” and “Who executes the action?” (original Datasheet) may require a greater assessment of the selected activities in relation to the project’s objectives, its repercussion on the affected populations and the maturity of the available data to proceed with the selected actions.

At this stage, the involved actors from public administration, the characteristics of the data science team, the processes used to determine the actions and the previously established objectives must be reviewed with an emphasis on who carries out these project activities and how.

Another key aspect refers to the data mapping stage. Here, the contributions of the workshop participants were broad and diverse, yet focused on the risks of both the regulatory absence of governance and infringed rights, and the debate on data quality. It is gathered that “Regarding the intrinsic dimension of data, the main obstacles are: data quality (disaggregation stands out as an insufficient element), proportionality among recording, collection and use, and finally, the shortcomings of specific governance frameworks” (Appendix 3), coupled with the fact that:

Data quality, not only due to a persistence in lack of disaggregation but also due to a disconnection between the data that is collected and the data that society requires to solve public problems deepens distrust. Distrust in the state resulting from which data it records, uses and analyzes, and the increasing tension between openness and privacy is a future identified concern. (Appendix 3)

Although greater emphasis will be placed on this topic later, the issue of the risks implicit in data collection and use processes is highlighted regarding data science project design, as well as the quality of the data used by public administrations to solve situated problems. Regarding the latter, the need to go beyond disaggregation as a data justice variable and evaluate the confidence resulting from how public data policies are designed and with whom they are agreed is underscored.

Lastly, in the design, team formation was also a discussed topic. Not only as part of a necessary change in project design, but also as a cultural change in public administration regarding its daily activities.

The regular practice of understanding that when it comes to data science project design the convened state actors are those with the most expertise in data, technology or state modernization is highlighted. In the first instance, this vision results in a design team restricted to technical knowledge. On the other hand, it also drives thematic decision makers away from the state and thematic knowledge outside of it.

The variable of “mastering” the subject or data promotes endogamy in identifying the problem, establishing objectives and actions. Finally, it leaves no room for increasingly important aspects outside of the specific area of the problem: legal, communication, digital infrastructure and governance aspects.

The team’s interdisciplinary nature and the participation of other state officials and members of the civil society as well as the affected population in the various stages of the design are elements of change identified in the workshops and the study’s analyzed bibliography.

Therefore, the inequities, biases or limitations of a data science project do not necessarily originate from the data dimension, but also from who collects, analyzes and implements a data science project.

Therefore, the profile of a data scientist is initially reviewed to carry out an inclusive public policy design, and also what type of team must be formed when executing a project with these characteristics.

Second Dimension: Data Scenarios

FIG 4. Reviewed Datasheet - Data (Justice- Intersectional) Scenarios for Public Projects with Data Science

The workshop contributions to the data scenario for designing public policy projects generally transform the original Datasheet. The iterative nature of the data mapping and its characteristics under premises of feminist data activism and data justice will transform its stages in a comprehensive manner. However, in this section, we highlight its role in the definition of the problem, the project’s feasibility, objectives, analysis and ethics, and of course, the data mapping in itself.

The task and role of data in the public agenda in general, and in particular in related projects that the following quote describes is initially mentioned:

The data justice dimension places the focus on myths about data: having more data doesn’t mean greater state efficacy, not “strangling the data so they confess” in favor of an objective, preventing the data from being used to revictimize, considering transparency policies as a way to provide citizens with better tools (Appendix 3).

The identification of the data-centered problem for public policy decision-making contains myths and biases that need to be discussed from the start of the project. The localization of data justice to identify the problem entails assessing in greater depth who and how many are affected by the problem, and what data types and sources support the project’s initial design.

Prefeasibility as a stage after the identification of the problem and as a term related to public policy work ought to be reviewed considering data access and the field of data science.

The data scenario in its access dimension is focused on the absence, scarcity or lack of timely data for inclusive public policies. Therefore, it refers to a gap between the problems of interest and the data that should help solve them. The characteristics of the current data despite the progress in openness policies are highlighted.

In addition, aspects related to lack of knowledge or awareness about privacy or the limitations associated with “The opening of data for public problems, the lack of knowledge about what data is recorded by each government area and their subsequent interrelation” (Appendix 3) are determinant topics regarding how to assess the quantity, quality and access of the data for the project’s design. If the prefeasibility stage is limited to the use of data collected and offered to carry out the process without a more comprehensive approach, greater injustices in the objectives, actions and modeling may be incurred.

It is also useful to emphasize and redirect questions about whether or not it is necessary to use data science in public policy projects at the stage of reaffirming the feasibility of executing them. In a document by Guerra (2022), Peña and Verón are cited in terms of the use of AI through the following questions that can be transferred to the field in question: “Instead of how to develop and deploy an A.I. system, shouldn’t we be asking first ‘why to build it?’, ‘is it really needed?’, ‘on whose request?’, ‘who profits?’, ‘who loses?’ from the deployment of a particular A.I. system? Should it even be developed and deployed?”. At this stage, the following questions may be included: What competencies do we have to carry out the design and implementation of the project based on data science? Do we have a mapping of sufficient data to make the design inclusive? What data science approaches are used? What are the potential data injustice risks based on the access to existing data, its nature and governance?

For the design of a public policy data science project, the original Datasheet refers to a mapping of internal (state) and external data. As mentioned in the Inclusive Data Charter or in international agendas relative to big data, it is necessary to frame the data mapping through the lens of data source diversification, in addition to recognizing the contribution of other data agendas, collection modalities and standardization efforts. These topics are also perceived in the conceptions and contributions of the workshop participants.

Strengthening the data map with other actors, other sources and other practices creates new opportunities for limiting the misalignments among data, problems and solutions. A virtuous circle may be generated to incorporate other data generation or collection needs during the project’s design process that account for the various facets of the same public policy problem.

As expressed in the introduction of the section on analysis of results, the collective contributions are supplemented by the documentary analysis selected for the study when it is essential to indicate other review ideas at key stages. This is also so at the analysis stage. This stage may be seen as another black box of design and ought to be assessed with a focus on interpretability, access, transparency and communication requirements of data-intensive projects.

Without specifying the different types of analyses or models that may be chosen by the project’s design team at this stage, it is interesting to mention the proposal made by Eitzel (2021) in her Modeler Manifesto to qualify the workshop contributions. The author describes how to make data science (its models) a political field based on collaborative practices and processes, hybrid techniques and alternative inquiries.50

The Manifesto allows delving into the non-neutral aspect of selecting the types of techniques and the criteria that must be reviewed by the teams in data analytics. Once again, the team’s and the data sources’ interdisciplinary nature are incorporated at the analysis stage in terms of relevance to achieve a public policy appropriate to data justice and inclusion requirements.

It is important to highlight that automated data decision may be supplemented by qualitative techniques in which the affected populations, the civil society organizations with expertise in design or even other areas of the state may be convened to agree on, deliberate on or participate in this stage, and thus, test and validate the models with alternative practices before they are used in the project’s final stage.

  1. To broaden the debate, the author’s statements that a future data science may be more trustworthy, transparent, fair and inclusive can be analyzed from a local perspective.

The protection of personal data, the risks of using sensitive data in areas like health or others in which vulnerable populations are revictimized are causes of concern. The data ethics aspect is perceived as an element linked to rights violation rather than data quality, standardization or nature. Access to data and its openness is also seen as an ethical issue.

Issues such as proportionality in terms of whether it is appropriate to use data science for the problem or equity are not directly expressed. Awareness raising within the State in ethical issues and current data regulations is key to the participants.

The original datasheet does not contain data justice aspects as such, but it does contain elements relative to data ethics narratives that were reviewed based on the +Alliance summon.

Third Dimension: Justice Design in Data Science Projects

FIG 5. Reviewed Datasheet - Fair Design in Data Science Projects: An Imaginary for Action

The last collective dynamics carried out is related to the dimension about the fair design that must be explored to design data science projects.

The dimension identified for a collective debate needs to be briefly contextualized. Revilla (2022)51 analyzes the regulatory AI frameworks in the region based on components for integrating a human rights and gender perspective. In this regard, the exchanges made in a virtual talk by the FAIR network about AI governance and policies and data in Latin America can be mentioned: reflections from a feminist perspective (Red FAIRLac, 2022), in which the participants mention the absence or scarcity of participatory processes in the drafting of regulations and the implementation of the regulatory AI frameworks in question.

This scarcity at the macro level of the design of public policies centered on AI, automation or data science is also reinforced at the micro level of the design of related projects in Latin America (Brussa, 2022). Based on this and the dimension proposed for the study about fair design, we express the importance of reviewing the stages of the original design based on suggestions debated in the virtual workshops.

Specifying the types of participatory processes inherent in or appropriate to each project design stage (Hintz, 2022; GPAI, 2021) will be left for other studies or a piloting stage. However, the imaginary resulting from the validation with the input of the November 2022 virtual meetings is presented.

We identified the need to transform the DATASHEET with new participatory proposals at the problem-posing, objective identification, actions and data mapping stages.

These proposals result from the localization of the principles of the Design Justice Network (2016) that are presented in the workshop dynamics with the titles: use of design to empower communities, focused on the affected communities, prioritization of design in the analysis of the impact on the affected communities rather than efficiency, recognition of other knowledge, sharing results, data and knowledge, integration of participatory mechanisms, and hybrid approaches for data collection and use of sources.

The participants prioritize recognizing other types of knowledge in the project design processes, analyzing the impact of the project’s actions on the affected communities, focusing on the needs of the affected community rather than on the success of the results, and incorporating participatory practices.

The document Advancing Data Justice Research and Practice (2021) mentions that “A goal is to elevate the ‘situated knowledge’ of communities to a status of influence in technology design and decision[1]making from which they are typically excluded” (GPAI). For the purposes of the study, design justice in data science projects for inclusive public policies is inherent in an imaginary in which participation in decision-making is also relevant within data science projects.

When to incorporate participation and the type of participatory mechanism to be implemented in the projects analyzed in this study will be the focus of subsequent analysis, as they go beyond its objective. In spite of this, Fig. 5 below incorporates the dimension proposed in the review of the DATASHEET as a cross-cutting element.

  1. Revilla, T (2022). Integración de la perspectiva de género en las estrategias de inteligencia artificial de América Latina: de la narrativa a la acción. In INTELIGENCIA ARTIFICIAL FEMINISTA Hacia una Agenda de Investigación para América Latina y El Caribe, Available at https://archive.org/details/inteligencia-artificial-feminista/page/n1/mode/1up

THE COLLECTIVELY REVIEWED DATASHEET: Final Considerations

FIG 5. Datasheet for Inclusive Data Science Action

The core product of the study is the creation of the revised Datasheet to be reused by the original recipients and new activisms and civil society organizations.

The relevant changes were specified intersecting dimensions and stages. Based on this, the Datasheet was modified as shown in Fig. 5, particularly mentioning the comprehensive stages, such as the project governance strategy, data justice and participation.

Project governance strategy: A strategy shall be devised by the design team that documents who will participate in the project’s formulation, implementation and assessment among the various parties (design team,52 state team, affected population, collaborators) and how. The contents and changes in the data and actor mapping stages will be decisive to determine the project’s roles, responsibilities, rights and data a priori.

To integrate inclusive action dimensions, project governance is exclusive as a non-technical element of data science and AI designs in the region. This strategy is useful for raising awareness in relation to rights, emerging data science agendas and inclusive data attributes. Additionally, it can be an initiation document to generate the necessary public trust in terms of algorithm auditing processes, micro public participation mechanisms and transparency required in the workshops.

Data Justice: Although it may be understood as a mere narrative change in a project’s design, mainstreaming the project’s entire cycle within the framework of practices and principles related to the field of data justice is a structural change in the Datasheet.

The difficulty in directly presenting the term in public exercises such as those performed in the workshops is expressed. However, valid dynamics and narratives can be established for its localization in the public sector. The main objective is to demonstrate the risk of devising a stage in isolation from data ethics and, therefore, adjusting the DATASHEET to broader reflection considerations for data science teams. To start by exploring the risks in terms of inequality and

  1. It differs from a state team because many times it is not entirely composed of public officials.

social justice that a public policy project design based on data science models entails is a distinctive trait.

Including the state officials that participated in the collection of data to be used in the project, supplementing quality by attributes of equity and data source hybridness, and opening participatory processes to assess the impacts of the project’s actions are variables that change when thinking about the cycle based on other questions related to fair content. For public problems, a shift toward data justice is vital to prevent potential “ethical washing” in AI policies in the public sector.

Participation: In the workshops, it was an element that questioned the participants, as it is not among the factors to be considered when thinking about public data science. The difficulty in implementing participatory processes and the complexity in identifying the appropriate times and methodologies may be obstacles that discourage its public value.

On the contrary, both in the workshops and the documentary analysis of the study, strengthening co-production, co-creation or other modalities of participation in the decision-making process of a project design at its micro level stood out as a turning point. This intervention may be very useful for a meso and micro level of the public policy cycle in general and public policies based on AI in particular.

Latin America has a substantial deficiency in implementing participation, both in regulatory AI frameworks and design projects. It is an exclusive aspect in the actor mapping, problem definition and data mapping stages. Examples like the ones provided by the DataLabJustice report (Hintz, 2022) and the AI agenda for Latin America of the FAIR Lac Network can be a route to establishing a closer relationship between data science and inclusive public policies. Fair design that integrates the affected populations, as well as data feminist principles or those emerging from the action plans of local open governments (Acción Colectiva, 2022)53 in gender and diversity policies incorporate specific actions rather than ethical principles to be discussed.

Data science design imaginaries for inclusive public policies show that actors in the public sector, the open government movement and feminist activism in the region require devising participation mechanisms (not necessarily institutionalized) when designing a public project with data science, piloting it and analyzing its developments.

  1. General Contributions: Action Routes for Designing Inclusive Projects

    • A Project Design Datasheet is a non-technical instrument that is very useful for raising awareness among regional actors regarding the obstacles and opportunities of data science and AI with a gender and diversity perspective and feminist principles. [AI inclusive PROVOCATION/ACTIONS: BEYOND TECHNICAL STRATEGIES | NO WASHING BIAS]

    • Elements such as forming a data science team, devising participatory processes and data literacy play a decisive role in moving from principles to action for feminist AI as well as the development of models, algorithms or related digital tools.

[AI inclusive PROVOCATION/ACTIONS: TRAINING|DATA (SCIENCE) LITERACY | AI AUDITING | DATA PARTIPATORY MODELS]
  • Regulations and policies on official statistical data have little interference in studies about feminist AI in the region and constitute a valid influential ecosystem in the technological, data and gender agenda. Aspects related to the types of existing gaps and technical debates on data quality or source diversity are more discussed there than in AI and gender scenarios.

  1. Acción Colectiva (Sep 19, 2022) Informe de Políticas de Gobierno Abierto con perspectiva de género. Available at http://www.accioncolectiva.net/contentFront/publicaciones-1/politicas-de-gobierno-abierto-con-perspectiva-de-genero- 68.html

[AI inclusive PROVOCATION/ACTIONS: CITIZEN-GENERATED DATA | OFFICIAL STATISTICAL DATA GAPS | DATA CAPTURE | COLLABORATIVE DATA MAPPING PROJECTS]
  • Ethical issues are narratives in data science or public policy spaces, unlike in the civil society, where the concern is related to participation and the recognition or revictimization of the populations affected by the designs.

[AI inclusive PROVOCATION/ACTIONS: DATA JUSTICE LITERACY | DIGITAL

RIGHTS LITERACY for AI practices]

  • Data justice is a narrative that needs to be localized by the academic sector or activist movements for the public sector.

[AI inclusive PROVOCATION/ACTIONS: LOCAL DATA JUSTICE POLICIES]
  • Project and data governance is absent in project design, and it is an element that incorporates equity, justice and inclusion into public policies.

[AI inclusive PROVOCATION/ACTIONS:DATA GOVERNANCE NETWORKS | DATA GOVERNANCE BY FIELD | DATA COLLABORATIVES]
  • Digital infrastructure is a factor that was dismissed by the participants in the design cycle, and it is a key topic in assessing risks or identifying a mapping of public technologies in the region.

[AI inclusive PROVOCATION/ACTIONS: PUBLIC (AI) TECHNOLOGY VIEW]
  • Public officials are highly interested in how to design these types of public policies, but the lack of teams with knowledge related to data and gender and feminist data science is acknowledged.

[AI inclusive PROVOCATION/ACTIONS: INTERSECTIONAL VIEW | COCREATION | OPEN GOV | PROTOTYPE]
  • The need for public advocacy to strengthen data science and AI project auditing in the region is highlighted.

[AI inclusive PROVOCATION/ACTIONS: BEYOND DESIGN | BEYOND MODEL]
  • Rights to privacy and having data disaggregation prevail over other rights or actions with emerging activity in the region, such as: participation in AI projects, communication of the use of algorithms in public policies, collective audits, and the right to information regarding automated decision-making.

[AI inclusive PROVOCATION/ACTIONS: BEYOND DISAGGREGATION | AI ALTERNATIVE IMAGINARIES]
  • Interdiscipline was repeatedly mentioned in the workshops. However, academic or government policies that perpetuate working in silos and competitive processes between agendas and funding are highlighted.

[ AI inclusive PROVOCATION/ACTIONS: CROSSOVER AGENDA]

ACKNOWLEDGEMENTS

The authors would like to acknowledge the other member of the team project: María Paz Hermosilla (GobLabUAI), to the workshops participants, the NGOs collaborators as ILDA, CIECTI & Abriendo Datos Costa Rica. We would also like to thanks our friends and the research fellows for the comments and support to complete this project. Thanks to all of you who are there!

APPENDIXES

METHODOLOGICAL
  1. Workshop Methodology Summary – CANVA

Note: This methodology is a remix of another one used during the workshops. This modified proposal keeps the main focus and dimensions of research.

The goal is to design a future matrix of methodologies for data justice literacy and participatory AI strategies.

Link https://zenodo.org/record/7600762#.Y9witHbMLIU

PROPOSALS AND CONTRIBUTIONS

  1. Data Science Project Scoping Worksheet
  2. Workshop RESULTS in Original Language

LINK: https://zenodo.org/record/7601038#.Y9w4FHbMLIU

Data Science Project Scoping Worksheet

This worksheet is designed for social good organizations (government agencies, non- profits, social enterprises, and others) to scope actionable data science projects.

  1. Project Name:

  2. Organization Name:

  3. Members of the project design team:

  4. Problem Description:

    1. What is the institutional context?

    2. What is the problem you are facing?

    3. Who/what is affected by this problem? (people of certain type, organizations, neighborhoods, environment)

    4. How many people/organizations/places/etc? Generate a table with a quantification disaggregated by the criteria you have available such as gender, age, socioeconomic situation, territory, race or ethnicity, etc.

    5. How much does it affect them? (eg, average waiting time for surgery, number of students who drop out, cost due to tax evasion, etc.)

    6. What are the causes of the problem?

    7. What are the current measures to address the problem and its shortcomings?

    8. How have other projects used data science or artificial intelligence to solve similar problems?

https://www.algoritmospublicos.cl/ and https://www.dssgfellowship.org/projects/

  1. Pre-feasibility analysis

    1. What faculties does the institution have to act on the problem?

    2. Will it be necessary to partner with other public bodies or organizations?

    3. Where has it been stated that solving the problem is a priority? (E.g. government program, strategic plan, speeches by authorities, etc.)

    4. Does it exist, and more importantly, can we access the relevant data (enough to be able to change the current way in which the problem is responded to)? Is the data disaggregated according to the size of the affected population?

    5. Do we have the human and financial resources to carry out the project?

5.6. What are the risks of the project (ethical, social license, implementation, etc.)?

Goals

What are the business/policy goals that will be accomplished by solving this problem and what constraints do you have? (in order of priority)

  • The technical solution that will be built (e.g. predictive model or dashboard or map) is not the business/policy goal - that is the tool that will achieve your goal

  • The goal should be specific and measurable

  • Achieving the goal should help solve the problem you’re tackling

  • Typical goals include improving/maximizing/increasing or decreasing/mitigating/reducing some outcome or metric (such as school

graduation rates or unemployment rates).

  • Typical constraints include budget, lack of human capital, legal restrictions, political will, and social license.

  • Consider tradeoffs between conflicting goals.

Goal

Constraints

1

2

3

Actions

  • Actions are activities or programs that institutions are doing/will do to address a problem. Actions can involve allocating resources, such as inspecting facilities, providing preventive services, outreach, etc.

  • Actions should improve when the institution has the information that is generated in the project.

  • Ideal actions should help you achieve the goals defined above.

Time

Question

Supervise compliance with fishing quotas in industrial and artisanal landings in ports

Before the action

What input is needed to carry out this action?

Ships in the port

Resolution indicating fishing quotas per vessel

Who delivers the input to carry out this action?

Each inspector decides based on his expert judgment which vessels to inspect.

What does the action consist of?

Verify the species caught, tonnage and percentage of occupation of the hold, according to the assigned quota.

During the action

Who carries out the action?

The inspector or inspector of the Inspection Department.

How often is this action performed?

Daily.

Where is the action carried out?

In the ports.

What is the result of the action?

Controlled vessels and determination of compliance with fishing quota.

After the action

Who receives the result of the action?

Head of audit.

What do they do with the result of the action?

Initiates an administrative sanction process.

How do we want to change the action?

If it is necessary to include a dimension of equity in this change, mention it.

The system will recommend vessels that must be inspected documentarily and those that must be inspected in person, based on levels of risk of non- compliance with fishing quotas. Currently, this selection is based on expert judgment.

Data

  • The data has to connect to the actions its informing so the organization can achieve its goal

  • Typical data science projects use administrative data as the primary data source, and enhance it with publicly available data sources (Census, other open data).

Partnering with the private sector or non-profits could be a way to obtain data you might be missing internally.

  1. What data sources do you have internally?

Complete. You can add or remove columns depending on the number of databases.

Database 1

Database 2

Database 3

Number

E.g., Hospital discharge system

What's in it?

Describe attributes in as much detail as possible, e.g., admission and discharge records from hospitals nationwide, with socio-demographic patient data, diagnosis, days of stay, type of health insurance, doctor information

What level of granularity?

E.g., transaction, person, organization, location

How often is it collected/updated once it is captured?

E.g., real-time, daily, weekly, monthly, annually, exceptionally

Do you have unique and trusted identifiers that can be linked to other data sources?

P. ej., RUN, SSN, DNI

Who is responsible for the data?

E.g., Hospital Records Department

How is it stored?

E.g. in a database, PDF, Excel, SPSS

Additional comments

  1. What data can you get from external, private or public sources?

Complete. You can add or remove columns depending on the number of databases.

Data source 1

Data source 2

Data source 3

Number

E.g. Air Quality Record

What's in it? Describe attributes in as much detail as possible, e.g., concentration of airborne contaminants such as particulate matter of different sizes

What level of granularity?

E.g., geolocated monitoring station per hour

How often is it collected/updated once it is captured?

E.g. daily

Do you have unique and trusted identifiers that can be linked to other data sources?

E.g. Monitoring station code

Who is responsible for the data?

Ministry of the Environment

Are legal agreements necessary for the exchange/access to information?

How is it stored?

E.g. database downloadable via API in an open data portal.

Additional comments

  1. In an ideal world, is there additional data would you want to get/gather that would be relevant to his problem? (surveys, CCTV, phone records, DNA, different frequency or granularity for currently available data, etc)

Analysis

  • Typical data science projects include a combination of analysis, typically including description, detection, prediction, optimization, and/or behavior change.

  • Again, the analysis is not the goal of the project - the analysis helps you use the

data you have to inform the actions you have access to in order to achieve your

goals.

  • Choose the right set of analysis for each problem

  • You must validate the analysis and use a validation process that matches how your analysis will be used in practice

Complete. You can add or remove columns depending on the amount of analysis to be performed.

Analysis 1:

Analysis 2:

Analysis 3:

Type of analysis

E.g., description, prediction, detection.

Purpose of the analysis

E.g., understand people's historical behavior, estimate a patient's risk of disease, identify actions that would reduce overfishing

What actions will use the information generated by this analysis?

E.g. inspection of industrial and artisanal fishing vessels

How will you validate this analysis using existing data?

E.g., using historical data, running a randomized controlled trial

Ethical considerations

Proportionality

Do you think a data science/AI system is the right means to solve the problem? Why? Have you evaluated other alternatives?

What negative impacts could your project have? Review similar use cases identified in the "Problem definition" section

Social license

Do you think that users/affected parties will find the use of data raised to solve the problem acceptable? Why?

If the entire population of the country finds out about your project, will they approve it? Why?

Data protection

Are you working with individually identifiable personal and/or sensitive data? Which?

Have you identified the justification or legal basis for working with that data?

Have you identified regulations that could impact the project?

Will mechanisms be necessary to ensure the quality of personal data, such as access, deletion, or rectification mechanisms?

Transparency

Which stakeholders should be aware of the project?

Have you considered any mechanism for stakeholders to communicate with the institution about the project?

Will it be necessary to explain the decision-making mechanisms or analysis to be implemented? Why?

Discrimination/equity

What basic inequities are there in the process/environment where the project is inserted?

Are there specific (vulnerable) groups for whom you want to ensure fairness of outcomes or protection of their rights? E.g., groups given gender, age, location, social class, educational level, urban-rural, ethnicity

What biases do you think the data might have?

Accountabili ty

If there is a request for information regarding the project, who oversees preparing the response?

Who is responsible if the system gets it wrong?

Do you have monitoring, control, evaluation mechanisms planned? How will they be documented and how often will they be?

Do you have training mechanisms in place to understand the responsibilities, legal and ethical obligations among the participating team?

  1. Team building

Organization/department

Description of the desired participation

Counterparty name/role

IT Department

Provide data infrastructure

Head of IT Department

Statistics Agency

Provide population data

Head of the Department of Statistics

Comments
0
comment
No comments here
Why not start the discussion?