Illustration by Somnath Bhatt
Seeing Like an Infrastructure
A guest post by Ranjit Singh. Ranjit has a doctorate in Science and Technology Studies (STS) from Cornell University, and is a Postdoctoral Scholar at the AI on the Ground Initiative of Data & Society Research Institute.
This essay is part of our ongoing “AI Lexicon” project, a call for contributions to generate alternate narratives, positionalities, and understandings to the better known and widely circulated ways of talking about AI.
\ ˌre-zə-ˈlü-shən \
- the act or process of resolving: such as
- the act of analyzing a complex notion into simpler ones
- the act of determining
- the process or capability of making distinguishable the individual parts of an object, closely adjacent optical images, or sources of light
- a measure of the sharpness of an image or of the fineness with which a device (such as a video display, printer, or scanner) can produce or record such an image usually expressed as the total number or density of pixels in the image
The fingers of laborers were a never-ending source of annoyance. Lost fingers, damaged fingertips, and rubbed-off skin contours made fingerprints unrecognizable to a system that posits healthy, young bodies as the norm. Age, exposure to nature, and hard manual labor had worn off those marks that were perceived as infallible signs of physical individuality. The first effort at encoding usually failed (An excerpt from page 74 of Ursula Rao’s ethnographic study of the efforts to enroll the homeless into Aadhaar).
I was just beginning my dissertation research on India’s biometrics-based national identification infrastructure, Aadhaar (translation: foundation), when I read Ursula Rao’s 2013 Biometric Marginality study. Her study captured breakdowns in the founding assumption of Aadhaar: high resolution images of fingerprints combined with iris scans can signify both bodily and data uniqueness. This study was the starting point of my thought experiment on anthropomorphizing data infrastructures. What if a data infrastructure was a person? How would it see people it is designed to represent and manage in delivery of services? Rao argued that the way Aadhaar sees Indian residents is filtered by the expectation that they have healthy middle-class bodies with distinct biometric features. While identity broadly conceived is about who you are, its representation in and through data infrastructures is a consequence of standardizing data categories. This essay explores ‘resolution’ as a metaphor and an analytic resource to map the differential treatment of citizens based on how they are represented in and through data categories in the organization of government services. I will show that data infrastructures see people through data categories.
Resolution manifests as a spectrum between high- to low-resolution in the uneven distribution of bureaucratic processes for creating and managing citizen data. This unevenness in turn shapes access to the rights and entitlements of citizenship such that those of ‘high-resolution citizens’ are expanded, while those of ‘low-resolution citizens’ are curtailed. Learning to see like an infrastructure, thus, can help us better analyze the emerging conditions of contending with surveillance, navigating precarity and inequality, and securing justice through data and algorithms.
The process of registration — whether it is the homeless trying to enroll into Aadhaar or any person signing up on any website — demonstrates a straightforward occasion to observe this insight in action. People who find it easier to provide data for core data categories of an infrastructure find it easier to claim their identity through it. The infrastructure sees them clearly. For people who cannot, the challenges of claiming identity through infrastructures only ramify. Catherine D’Ignazio and Lauren Klein capture this dynamic succinctly, when they argue that: ‘what gets counted counts.’ People who do not fit neatly into an infrastructure’s core data categories are often made invisible by placing them in residual categories, typically instantiated as ‘none of the above.’ Residual categories doubly silence people at the margins of data infrastructures: problematizing their place in core data categories while simultaneously rendering invisible their individual identity and social history.
Susan Leigh Star outlined several ways that people may find themselves residual in a data infrastructure: (1) their data is not registered; (2) they fall into two or more categories when only a single option is permitted; (3) they fall outside the infrastructure’s representational scope; (4) they are not believed by data clerks or data clerks do not perform data entry competently. These situations present profound challenges for people in their efforts to claim representation through data categories and by extension, deeply shape what infrastructures see. Consider, for example, that an infrastructure can only see a real-world box as a box when data is collected along three categories of its dimensions: length, breadth, and height. If data is collected only for one of these three categories, the box will look like a line. If data is collected for two categories, the box manifests as a rectangle. How data categories are combined, and what data is collected, deeply shape how an infrastructure sees people.
Registration, however, is only the beginning. Data circulates and exhibits function creep. Aadhaar data is used by citizens to represent themselves to the Indian government for a variety of purposes. When this data is correct and represents citizens adequately, it produces everyday experiences of data-driven efficiencies in last mile delivery of government services. For example, receiving cash-based food subsidies, farming subsidies, welfare pensions, and scholarships directly from the government into Aadhaar-enabled bank accounts. When this data is incorrect, however, it makes it harder for citizens to represent themselves through Aadhaar in accessing government services. For example, I interviewed members of a Delhi-based NGO who were involved in enrolling the homeless into Aadhaar. As recording data on residential addresses emerged as a major barrier during this process, the NGO members were asked to provide the address of their NGO instead. A member pointed out that this workaround had severe downstream consequences:
Suppose we put in [the NGO’s] address in South Delhi for a homeless person who lives in North Delhi. This is a problem because if, for example, [he gets his Voter ID based on Aadhaar] he would only be eligible to vote for a constituency in South Delhi instead of North Delhi. Or worse, this person may not be able to lay any claim on where he lives in North Delhi, despite the fact that he has been living there for more than 20 years. Address matters (Interview conducted in Delhi, 23 October 2015)!
While the NGO members were cognizant of these consequences, not all people possess this form of data infrastructure literacy. During my fieldwork, I consistently came across citizens who found it difficult to understand and navigate the networked logic of implementing Aadhaar and thus, struggled with claiming their Aadhaar-based welfare entitlements. How infrastructures see people is conditioned by how they navigate data flows across services and their emergent tactics of sharing and withholding data.
Ways of seeing are deeply connected with ways of knowing. In the context of data infrastructures, what they do and do not see can create conditions for entirely new ways of interpreting citizen data and classifying them. Aadhaar, for example, has not only created a new category of “unique” citizens in distribution of welfare entitlements, but also its inverse in “duplicate” citizens. There is an innate distance enacted between citizens and the street-level bureaucrats in organizing government services through data infrastructures. Citizens often have a limited role in how their data is interpreted across various bureaucratic realms. However, this does not mean that they have no agency in shaping the interpretation of their data. There are diverse ways to resist and make do with data representations and everyday experiences of living with data (see, for example, the Our Data Bodies project). At the same time, exercising judgments about data records is not necessarily easy for bureaucrats. The deeper a bureaucrat is in a dataset, the harder it becomes to imagine a reality outside of it. It is harder to deny welfare to a person when they are sitting right in front of you. It is easier to do it on a data record. Thus, data infrastructures lay the groundwork for interpreting citizen data such that they tend to not only limit people’s ability to see themselves fully in data, but also limit their ability to be seen by the people that administer the data.
Data infrastructures lay the groundwork for interpreting citizen data such that they tend to not only limit people’s ability to see themselves fully in data, but also limit their ability to be seen by the people that administer the data.
In engaging with this thought experiment over the years, I have come to realize that two kinds of translation are required to see a person like an infrastructure: (1) translating a person’s identity into data records; and (2) translating the use of data records into lived experiences of data-driven delivery of services. These translations are embedded in the politics of building data infrastructures and engender the politics of the consequences of their appropriation. Together they produce a spectrum of resolution of people, from high to low, in the workings of data infrastructure. Here, I employ resolution in two ways: first, to evoke the level of detail in the object being imaged through an imaging system and second, to imply the act of solving a problem.
Resolution, as a visual attribute, refers to the degree of clarity of an image on a computer screen or optical instrument, expressed as a matter of scale: the lower the resolution, the less detailed the image becomes. For example, a low-resolution telescopic image will show the planet Jupiter as a small round ‘star’ (as in Galileo’s classic descriptions), while a higher-resolution image will reveal a large reddish spot, and still higher resolutions will reveal swirls and bands. Analogously, certain combinations of data provide more comprehensive pictures of people than others. The presence of granular data makes them more visible and produces a higher resolution of people. Resolution is also a matter of adjustment and scale: add, improve, or cross-reference data, and resolution increases (rather like the zoom and focus of a camera lens). A lack of representative data renders people and their activities less visible to a data infrastructure; they manifest in lower resolution. Subtract, degrade, or de-link data sets, and resolution declines. This visibility is not just a method of control and surveillance; it also conditions a person’s agency, existence, and rights as a citizen.
In the context of efficient service delivery, resolution also indicates the degree to which a data infrastructure achieves its function to uniquely identify a person. Some people readily fit into the core as well as interpretive data categories and hence, are more easily identified through data infrastructures than others. Higher resolution implies a higher degree of achieving entity resolution. These two meanings together make up the politics of resolution. On the one hand, when the efforts to represent and claim representation through data align, they produce visibility in high resolution. People rendered in high-resolution find it easier to align their data with their way of life. On the other hand, when the work of representing and claiming representation through data is misaligned, it produces visibility in low resolution. People rendered in low-resolution struggle to overcome the differences between their data and their way of life. Higher resolutions often pose privacy and surveillance risks; lower resolutions broadly run the risk of data-driven marginalization and abjection.
Social justice in a data-driven world is a matter of rendering effective resolution to empower citizens with the affordances that data infrastructures for development are designed to provide. Effective resolution, as a balancing act between high- and low-resolution, is not a given. It is, rather, a contingent practical accomplishment. The diversity in challenges of organizing citizen data — such as registration, circulation, and interpretation of citizen data — together elicit the different ways in which citizen come to struggle with their data representation. The solutions to these challenges are not just technical innovations (for example, processes of exception handling and use of proxies). These challenges require sustained efforts at data infrastructure literacy, public policy work, citizen engagement, activism, and most importantly, sustained sociopolitical and material work of people themselves to secure fairer representation.
For state actors, designing for effective resolution is crucial for managing citizens at scale. It raises questions such as: (1) which data categories should be mandatory in delivery of government services? and (2) how should accountability be distributed as different aspects of citizens’ records are managed by different bureaucracies?
For citizens, a consciousness around resolution offers a resource for understanding their representation through data in government services. This consciousness lends itself to questions such as: (1) how do core data categories of organizing government services mutually shape my ability to claim citizenship rights? and (2) how can I better fit within (or strategically disappear from) data categories while maintaining my relationship with the state?
Efforts to secure social justice must focus on figuring out practical strategies that empower people to advance their way of life through data infrastructures. These efforts may require ethicizing design and implementation of data infrastructures, o making do with the limitations of infrastructures to represent complex life worlds of people. Resolution, thus, invites us to ask: What are the new conditions of living in a data-driven world and how do they emerge in contests to secure representation through data?