Untitled

In this post, I propose an approach to score AI systems in terms of how well they align with the interests of different groups, by measuring the ability for data creators to reason about and configure the systems' underlying "data pipelines". This is meant to provide a path towards measuring alignment in a manner that supports plurality, i.e. these scores can help us predict when AI systems are likely to "facilitate cooperation and flourishing across a diversity of social groups".

This post was primarily inspired by discussions kicked off by OpenAI’s recent blog post. It is also informed by my recent participation in efforts to further define directions for “AI Safety” research and recent efforts to define and advance plurality research.

I’ve used the term "AI" in the title because this is the term being used in ongoing discussions, and I think this ship has definitively sailed (full disclosure: I also used the term in my dissertation, so I suppose I’ve already bought in a bit). More specifically, I am talking about computing systems that rely on some combination of distinct datasets for training, evaluation, and/or calibration. The discussion is especially relevant to “generative AI” that produces text and images, like ChatGPT and StableDiffusion, but was originally influenced by earlier work looking at search and recommendation.

The post will be structured as such: a two sentence definition, followed by a two-paragraph definition with more details, followed by an "FAQ" with even more details (the first version will be an IFAQ, because these are questions I imagine might be frequent). Finally, because this is a post I plan to update, I'll include a Change Log at the bottom of the page.

Two-Sentence Definition

An AI system is more aligned with a coalition if members (1) know how their data contributions flow to that system, (2) can reason about how changes to data flow might impact AI capabilities, and (3) have agency to reconfigure these data flows.

By measuring the alignment of various systems to various coalitions, we can understand alignment in terms of concentration: some systems are aligned to just a few individuals, others to very broad groups, and others to specific subgroups.

Two-Paragraph Definition

We can measure the relative alignment of data-dependent systems to the preferences of various coalitions in terms of data contributors’ knowledge about data flows and ability to reconfigure these flows. Specifically, we will set three thresholds: (1) a threshold for awareness of data pipelines, (2) a threshold for ability for people to reason about data impact, and (3) a threshold for minimum agency to configure data pipelines and attempt to the count the number of people who meet these criteria. We make the following assumption: if people understand how their data contributions impact AI capabilities and can easily act to configure how these contributions flow to different organizations and systems, over time people will act (i.e., use the “data levers” available to them) to improve the capabilities of systems and organizations they support and hinder those they do not. We can score a system’s alignment by estimating the fraction of data contributors who meet some standard for data transparency and agency (i.e., are informed enough about data flows and can reconfigure their data flows with low enough cost). Under this definition, a system becomes more aligned to a coalition when more users are given high-quality information about data flow (including estimates of data’s impact on capabilities) and low cost options to redirect that data flow. Then, a system can be understand in terms of the concentration of its alignment between different groups.

This definition most likely requires a collective approach to data agency — time costs may make it impossible for individual people to make decisions about how individual data records flow to individual firms (providing prohibitively expensive but "technically possible" actions is not agency). Instead, we could support data agency by scaffolding data-mediating organizations. Finally, this definition supports a pluralistic approach to alignment: given there are many distinct coalitions of people in the world, any given system has a variety of alignment scores, and any attempt to produce a single alignment score requires explicitly weighting the preferences of different groups. Instead, we can understand alignment scores as a variable to be used similarly to inequality measurements.

FAQ

Are there other hidden assumptions here?

There is an assumption that competing organizations and systems will exist. This definition is not particularly useful in an ecosystem dominated by a single player.

Why do we need to solve a governance problem with data contributions? What if people just vote directly on issues of AI Governance, or continue to vote for elected officials who pass legislation?

This idea is complementary to other mechanisms for AI governance. Indeed, a paradigm in which everyone has access to lots of information about AI systems and votes directly on AI-related policy could achieve the same outcomes. The data pipeline focus is particularly promising because it's very feasible; it emerges from a "natural" dependency in AI systems.

How does this idea relate to existing AI Alignment work?

In the blog post, OpenAI links to to their Alignment page post, which emphasizes the goal of making "artificial general intelligence (AGI) aligned with human values and follow human intent”.

This line of work includes research directions such as developing new techniques for collecting and using human feedback, training explanatory models that become part of the human evaluation loop, and performing core research on machine learning explainability and robustness.

The definition proposed above also relates to explainability, but the standards for "how explainable" are embedded in what we decide is a reasonable definition for “high-quality information about data flow” and the standards for robustness are embedded in how much additional agency to change data people are given.