Algorithms are a set of instructions that determine how programs read, collect, process, and analyze data to generate an output. Intentionally or unintentionally, bias can be introduced at any stage of this process.
Imagine you are deaf and blind. All the information you get about the world comes from a few sources, and depending on who the source is, the information could be highly biased towards one category. Because you have no other sense of the world, the bias could take time to notice. Comparatively, we can think of computer algorithms in the same way. Imagine a highly specialized, cutting-edge machine learning system made by a private company for a government agency. Many times, the agency does not have in-house skills to thoroughly test how well the system solves the complex problem they are trying to solve. An example of this situation was experienced when Police Departments in the U.S bought “cutting edge” facial recognition systems from private companies to help identify potential perpetrators, systems that research eventually demonstrated were biased against certain population groups. Unfortunately, the difference in time between when these systems were deployed in real life, and when they were banned after reviews demonstrated they were inaccurate and biased, is lifelong damage for people in the several minority population groups who were falsely accused - damage that cannot be undone.
Algorithms as a set of instructions that determine how programs read, collect, process, and analyze data to generate an output.¹ Intentionally or unintentionally, bias can be introduced at any stage of this process. Depending on the ground truth data available, developers can perpetuate this by not including certain labels that are required to describe certain phenomena. Machine learning models often require large, well-balanced datasets in order to deliver unbiased results. Algorithmic bias could also come from the people building the models. Where data is incomplete, design teams often decide to omit or merge certain classes because they do not have enough data to represent that class or they assume the difference would be insignificant.
It often takes a long time to detect the results of biased algorithms, because they often affect data points that represent minority groups in the system. Depending on who is testing the system, majority groups can also be missed. For example, private companies often are not even from the same country as the governments they deploy with, and do not consult local experts when building these systems. This can contribute to design decisions which can introduce bias in their algorithms.
It is not easy coming up with algorithms to solve complex problems. These problems are usually experienced by real people and cause real consequences. The initial goal when building these algorithms is to represent all the data points fairly in order to solve the problem. However, as you begin to build these algorithms you then realize different problems may arise due to different challenges. These challenges may include: the lack of representative data, lack of resources (labor, computation, skills, time, etc.), poor communication of requirements, the lack of proper background research on the problem, language barriers, or even the lack of perspective from the person experiencing the problem. As a developer, experiencing one or more of these challenges could skew how you think about the problem you are solving altogether and thus redefine what it really means to solve the problem. Building these algorithms is usually heavily dependent on what the developer thinks is important in order to solve the problem. In this way, they would not take for granted certain design decisions during the creation of the algorithm. It is very important that the teams are diverse in thought, in order to inspire multiple perspectives when it comes to defining which algorithmic design decisions might cause the algorithm to be biased.
These algorithms are usually not perfect, (if not always). Even during the previous era when we were digitizing paper records, some alphabet letters were not included in some computer systems which were used to digitize these records. That algorithmic bias resulted in many people’s names being spelled differently and as a result, having a different meaning altogether. However, with these more advanced algorithms which are used today, the algorithmic bias could result in things like more population groups punished or neglected disproportionately all because these new algorithms take bigger decisions that were previously taken by human beings. One could say that it is the responsibility of companies to share the shortcomings of their algorithms with their clients and educate them on how best they can mitigate them. However, the reality is that there could be little reward for them to do that in complete honesty or the company might not know that there is detrimental bias perpetuated. The fact that there are no clear guidelines when it comes to implementing these novel solutions in the real world, also means that the idea of accountability can be taken for granted. Another big factor is that agencies/institutions which use these algorithms on the public are not usually transparent and make it difficult for the public to review the systems.