Originally published on medium.

“… computers, while they seem ever new, actually have a mechanistic way of limiting what we see and know, locking us into the present, all the while creating an illusion that we’re all seeing. … We create programs using the ideas we can feed into them, the ideas in circulation at the time of programming, but then we live through the program, so we can forget the arbitrariness of the moment. We accept the ideas embedded in the program as facts of nature.” — Introduction to Close to the Machine, Ellen Ulman
Our lives are now data-driven. In fact, more data has been gathered in the past two years than ever before. Data has either been collected from you (maybe you clicked “I consent” on a website) or in relation to you (forms of hidden or mass surveillance) for so long that it precedes any first impression you could try to make on the world. Another trend is also rooted in this “data hype”: artificial intelligence (AI) technologies. Ever since the start of the deep learning revolution ten years ago, AI has promised to aggregate large amounts of data and quickly analyze it for insights into who we are and what we care about. And indeed, systems powered by AI algorithms are everywhere. They determine our credit score, how and who we date, job searches, hiring practices, and the transfer and sale of both digital and physical goods. A feedback loop emerges: data is gathered en masse, then processed using AI, whose insights guide decision-making, which in turn justifies more intensive collection of data. This feedback loop poses problems for “Responsible AI”, the prevailing paradigm of how to build ethical AI systems. Until now, most research and industry work on AI ethics has focused on making algorithms more fair, or on making their classifications more accurate. The focus has been on improving static, one-off decisions so that AI can wring the most value out of the data provided to it. However, this only addresses the symptoms of a deeper issue. AI and machine learning don’t just comprise a new way to analyze vast amounts of data–they also reorganize our understanding of the world around us. Doomscrolling on social media, speed swiping on dating apps, panic buyingthrough digital retail — these are new social activities wrought by runaway feedback loops between humans and automated systems. AI doesn’t change what already exists. It doesn’t alter data. However, AI interprets data in a specific way, which has a direct influence on the way we choose to frame reality. AI has been changing the way we understand ourselves and others, and ultimately, changing the way the world works. However much we rely on machine learning tools as sources of information, the promise of AI as an unparalleled data processor ought not be confused with AI as a source of moral authority or political legitimacy. Excessive reliance on automated feedback loops is dangerous. Our lines of communication become more brittle and decision-making more reactive than progressive. It allows the exponential spread of propaganda and information warfare, creating a false representation of reality, and leads to a false sense of security that exposes us (even more than usual) to events that cannot be predicted from data alone.
How do machines understand the world?
We don’t typically put a lot of thought into the technology we use. I might open my laptop to get some work done, play an online game, or do a video call with family. Commuters take the subway to get to work, go to a concert, meet up with friends, or just to get out of the apartment. For many, Twitter is a way to follow the news, or see what celebrities are up to, or more generally see what’s trending. What exactly makes the last case different from the first two?
Personal computers and public transportation are similar in that they act as an interface between different parts of our lives. What makes those technologies function (whether a central processing unit or Automatic Block Signaling) is their ability to reliably provide the services we expect them to fill, without delay or disruption.

That is not how social media works. When you log onto Twitter, you are accessing an algorithm that has learned from a dataflow that was generated from the kinds of content that other users like you have engaged with in the past. When you then start to click on links or post comments, you add new data inputs to that flow. These may be fed back into the algorithm and change the kinds of content you see the next time you log on, as well as other users’ feeds if Twitter thinks they are like you. In other words, even if you don’t notice it, Twitter is incrementally different pretty much every time you use it. The reams of information about what you and others liked are funneled into a pipeline of the types of things you are likely to engage with. Your laptop does not change itself every time you open it based on what other people happen to do on their laptops. Nor do the New York subway lines change based on what routes you most often take (however great that would be!). Those systems, by and large, are static — they don’t qualitatively change from use to use. To be sure, there is a longer-term, informal sense in which those systems get updated by corporations and cities. Apple monitors product sales and makes decisions about when and how to refresh iPhones or Macbook Pros; the New York City Transit Authority makes judgment calls about when to replace legacy cars with the New Technology Trains. But the way they get updated is much slower and more deliberative than the updates to news feeds on social media platforms. They also reflect a great deal of concern for the experience of the end user–what it feels like to use the product and form a relationship with it over time, rather than just accurately predict what the user is likely to do at any given moment. The machine learning behind news feeds is powered by automated feedback loops. These loops depend on three things: the data, the model, and what engineers call the “optimization problem” (i.e. how the system completes a given task). The data transforms the complexity of the world into zeros and ones so that it can be processed by an electrical system. The data is then organized according to its properties — funny or sad, men or women, news or entertainment — so that these can be aggregated and learned by an algorithm. The model is the result of what the algorithm learns — its representation of you, the user, based on what kinds of data labels you have engaged with in the past. The optimization problem determines what kind of model is learned and what the designer’s goal is when labeling data or training the algorithm. Maybe the designer just wants to show you the content you are most likely to respond to, or maybe they are testing out a “time well spent” version of Facebook that attempts to track your mental health or emotional state.
How does that influence our understanding of the world? There is nothing wrong with automated feedback. But if it gets out of control, then it is easy for AI to not just provide more convenience through data analysis, but to redefine your understanding of what it is possible to do, and thereby change your activities. For example, let’s compare a generic search engine to social media. On the former, you type in specific words related to the content you are looking for. An in-house algorithm then matches the sequence of words with websites that most often include those words. This works pretty well because there is so much data available. But Facebook has built something even stronger: the actual network of people through which your social life operates. In theory, this means that Facebook has access to the most important properties through which your social life is encoded: the emotions, states, topics, and categories through which you perceive the world. Likewise, Amazon has assembled an unprecedented supply chain of buyers and sellers through which pretty much any type of good can be bought or sold. Amazon and Facebook cannot literally predict what friends you want to add or what products you want to buy. But through a variety of techniques, these sites are becoming as good at fitting your behavior patterns to advertisements as they are at matching customer searches to online content. They don’t use data to add value to your life — they extract value by reorganizing your life around data collection. The result is machine learning systems that induce behaviors from human users but fail to understand their context, generating runaway feedback loops that can come at our expense. Facebook’s automated tools have deleted Catholic group pages because it misclassified their prayer posts as spam; on Twitter, Microsoft’s Tay chatbot learned to post racist and sexist remarks by copying other users’ posts; teens on TikTok manipulated the “For You” algorithm en masse by feigning intent to attend Donald Trump’s Tulsa rally. These systems aren’t just inaccurate–they rewrite the rules of social interaction around themselves. The difference with a generic search engine is that companies such Amazon and Facebook create digital platforms that use AI to match producers and consumers. This is true whether the “producers” are social media stars or physical retailers, and whether or not the “consumer” is merely using a free service. The point is that the platform comprises a social ecosystem that generates wealth for the operator but may leave parts of its user base exposed to various types of preventable harms. The dataflow that makes this wealth possible — in particular its properties, which have been labeled to accommodate algorithmic matching between producers and consumers — is itself the source of many of these harms. As users become dependent on the vision of the social world the platform creates, they become subject to that platform’s own vulnerabilities.
Why does it matter?
It might not be obvious why these issues are critically important for user safety, let alone society. After all, don’t we have the right to log off or use alternative services? Even if the information ecosystem of any given platform is restrictive, does it really affect how we live our lives?
There are several reasons why the systems that run Twitter or Amazon pose critical risks even beyond the terms of their own operation. Three of these are outlined below:
Systems are restrictive. Any AI system, no matter how sophisticated, is fundamentally limited by what philosophers call the problem of induction. This is a fancy term for the inability to generalize beyond examples of phenomena that happened to have been correlated in the past. If a child asks why the sun will come up tomorrow, and is told “because it always has before”, that isn’t a satisfactory answer. Likewise, AI depends on historical data to learn from and to determine its predictions of what can happen in the future. By definition, anything outside that narrow feedback loop is not just difficult to understand, nor uncertain in its likelihood, but fundamentally impossible to predict. The system is in effect blind to its possibility. This is what happened when a self-driving Uber in Tempe, Arizona collided with and killed a pedestrian walking their bike across the road — the image classifier did not even predict an object there, and in a sense did not “see” anything in front of the vehicle. It follows that the use of such systems heavily restricts us to “known known” phenomena, and anything radically unexpected — so-called “black swan” events — completely short-circuits the AI feedback loop. If we have many interdependent systems that are built in this way, they may be subject to cascading ripple effects when a fundamentally new phenomenon appears, which could lead to catastrophic failure scenarios. It’s not possible for AI systems to represent all of reality within data, and therefore they will continually suggest a reality that leaves us especially susceptible to black-swan events. 2. Systems are myopic. Despite its impressive capabilities, machine learning only consumes a very small amount of the world’s unstructured data. Most machine learning is what is known as supervised learning, meaning that it requires an enormous amount of active human involvement to identify the data structure, label it according to significant properties, and then monitor how it is learned from in order for the classifier’s behavior to be appropriate, rather than harmful or completely arbitrary. In order for the feedback loop to technically work, it actually has to be extremely short-sighted in its ability to predict anything. The problem is that the vast majority of the human experience is not so well-structured that it can be easily fitted to this feedback loop, meaning it is effectively out of scope. A dating app may become pretty good at predicting the kind of potential partners you find romantically compelling, but it has no ability to predict what would make a relationship work over time, or even what would make a good one-night stand. It has no understanding of why we swipe; all it can do is predict which profiles you are likely to swipe right on. Anything outside of that problem definition is, from the system’s point-of-view, non-existent — which is a major problem as we come to rely on apps to shape our sense of the dating pool and how to act within it. 3. Systems are brittle. All the things that make AI systems good at learning from us — their observations, their ability to scale — also make them possible to manipulate or “game”. If you spend hours on your smartphone scrolling through Instagram or Twitter, you are providing gobs and gobs of data for the system to learn from, largely without you understanding what kind of content you’re implicitly supporting in the process. TikTok even monitors how long you “linger” on a given piece of recommended content — you don’t have to click or engage in any way for its algorithm to learn more about your viewing preferences. This sensitivity to multiple types of human input doesn’t make these systems smarter, it just makes them vulnerable to various types of information warfare. The line between hyper-media-savvy strategizing and straight-up digital propaganda has only become blurrier. Russia, in its war against Ukraine, has shut off access to alternative media platforms and used digital outlets to double-down on its preferred narrative for the invasion — China is likewise highlighting content that favors the emotions, situations, and personalities that figure into its own geopolitical position. Machine learning, it turns out, is anything but objective. On the contrary, it erodes any interpretations of the world that cannot be optimized through its systems.
Key take-aways
The power of machine learning lies not just in the data at its disposal, but in the models of reality it learns and the criteria by which those models are learned and updated over time.
The interaction between labeled (and unlabeled) data, learned model, and system behavior is the feedback loop that determines whether an AI system is good, harmful, or vulnerable to external forces.
As our lives are increasingly governed by digital platforms, these feedback loops heavily influence on how we understand ourselves.
The tendency of systems to be restrictive, myopic, and brittle leaves us more and more vulnerable to system harms over time.
These risks are surmountable if we learn to do a better job of organizing how feedback loops manage our lives. That isn’t just good design practice — it may well be the next step in how we make AI systems more ethical and aligned with how we want to live. Written by Thomas Krendl Gilbert and Megan Brożek.
Kommentare