Rachel Wilkerson. Edge of Causal Imagination.

The Edge of Causal Imagination

In the much-hyped muddle about data science and omniscient algorithms, causality has emerged as the latest beacon of a buzzword. The question of what causes what drives modern scientific enquiry with a relentlessness matched only by unscrupulous journalists looking for a sensational headline. (A point the webcomic XKCD drives with the incendiary headline about carcinogenic green jelly beans!)

As far as the machine learning community is concerned, modern concepts of causality emerged contemporaneously on the east and west coast from the research groups of Peter Sprites[1] and Judea Pearl[2], respectively. These groups rely on a structure called a directed acyclic graph (DAG) to describe what causes what. A graph is a mathematical word for a drawing of dots with arrows between them (vertices and edges if you aim to be fancy.) This is a fantastic vehicle for informed hypotheses and wild guesses alike. A biologist can sit down and draw the pathway by which one protein phosphorylates another, just as a business executive can show draw a supply chain for a certain product on post it notes. DAGs rank very highly in terms of transparency and interpretability.

This simple structure underpins great swaths of literature in causal inference and artificial intelligence. The only requirement we make of the DAG is that it be free from cycles: series of edges that make a troublesome little circuit. On the one hand, this seems an entirely reasonable trade off: no pesky little loops in exchange for powerful algorithms that take observational data and select whole model structures, uncovering those phosphorylation pathways for the biologists! And yet, even this should raise suspicions. Omitting any sort of cyclic feedback loop seems a rather severe oversight. To be sure, mathematicians have their ways around this, often consisting of unzipping the cycle and pinning it to a series of measurements in time.

The field of causal inference is a wildly exciting place, full of people who have determined how to make algorithms more fair, and translate natural human language to zeroes and ones. And yet, it suffers from a slight failure of imagination where structures are concerned. The world, bursting forth from primordial soup, did not emerge to us after all as a series of neat dots and edges. The poets would never be so dull to impose a rigid DAG-like structure on the vibrant world. Hopkins in particular celebrated the breaking of form, holding that beauty came, not from orderly repetition, but from

All things counter, original, spare, strange;

Whatever is fickle, freckled (who knows how?)

Speaking for the scientists, Faraday encouraged coworkers to “ ‘embrace the monsters’ and explore alternative approaches to representation.”[3] The DAG pioneers undoubtedly bravely forged a new and powerful language of causal algebras. But the complexities of the wicked problems facing call for that same spirit of discovery directed towards alternative forms of representation.

Rather than coercing the descriptions of biologists and CEOs to a vanilla DAG, what if instead the modeller modelled the problem dynamics first and then made room for nuance in our mathematical descriptions of causation? Careful listening, a virtue in rather short supply in 2020, has the potential to unlock beautiful, new representations of complex problems.

READ NEXT:  Milker — Cold Brewed 004

My personal exposure to the limitations of the DAG occurred in the realm of policy research. Working with a team of researchers to understand the dynamics of food insecurity in Texas, I talked with clients, administrators, and politicians involved in the complicated process of obtaining aid for food insecurity. Our team worked with the Summer Meals Program, a program that aims to provide meals for children during the summer months when two reliable meals disappear. In an attempt to model the drivers for participation in the program, I hacked together a spreadsheet that listed a curated list of possible drivers and asked a group of program experts to draw edges between them.

This may sound hacky, but eliciting structure from a group of experts represents a wacky, pragmatic branch of statistics called expert elicitation. Expert elicitation works in situations in which we have an unwieldy problem, or limited data, or both. Historically, the field emerged after the Three Mile High Incident. In the aftermath, as researchers sifted through the risk analysis, they noticed that groups of staff had assigned probabilities to rare events. Those probabilities were then combined and used to rank safety risks. To determine what causes what in a nuclear reactor meltdown, we might want to simulate a Randomized Controlled Trial, in which we simulate a number of different Virginias, intervene on one of the elements of a system, and then see what the knock on effects are.

Until science discovers a simulator for parallel universes, a DAG is one of the best proxies for this. Causal inference shows the vertices of the DAG as parts of the system of our nuclear reactor, assign a particular context of being in Virginia, and then intervene on the system, observing how the probabilities update.

While the Summer Meals Program administrators incur a rather lower level of risk than the reactor operators at Three Mile Island, the basic mechanics of causation in a DAG are the same. One participant asked me about a particular set of edges that only made logical sense in a particular context. The summer meals program operates meal sites that are open or closed to the public. Open and closed sites face a very different set of challenges to participation as only the latter has a captive audience (usually a summer school.) However, the Department of Agriculture administers both programs in a very similar way. I would later learn to call this a structural zero: a question that only makes logical sense in certain contexts of the problem. But in the moment, I fumbled. How was I going to interpret this in a DAG?

The DAG dead-end forced me to examine the motivations for using a DAG in the first place. Largely, I wanted to use a DAG because I had access to a snazzy bit of software that would allow me to point and click to generate a graph, enter the probabilities, and click a button to do a sensitivity analysis. This would garner a certain appreciation from program administrators and policy wonks. But the structural zeros would require fudging those probabilities!

Embracing Faraday’s monsters, I embarked on a stud of alternative representations of graphical models. One in particular resonated as a natural fit to the sorts of problems policy makers described. Rather than describing processes as edges between random variables (your dots or post it notes), people tell stories. Storytellers unfold processes as series of events–a natural fit for a tree based structure. Leaning into the storytelling aspect of elicitation, we might instead of a DAG form a tree that depicts the possible unfolding events. This removes the baffling issue of structural zeroes–nonsensical series of events are simply removed!

READ NEXT:  What is a poem? Or “How to Think about Writing and Reading Poetry”

Other strange models have infiltrated the fringes of causal inference literature. The Multi-regression Dynamic Model chains together models to show how regions of the brain light up during a fMRI scan. The Flow Graph describes complex supply chains between agents with assumptions that break the vanilla DAG. Chain graphs elegantly link together layers of interactions. Each of these new forms emerged from statisticians who heeded the command of so many great artists, from Mary Oliver to Annie Dillard, to pay attention! Watch the dance of the pixels on the fMRI data, ask the oil companies how their goods flow through the global economy. Listening to the words people use to describe the curious causes and effects offers mathematical modellers a creative edge.


[1] Sprites, P., C. Glymour, and R. Scheines. Causation, prediction and search. Cambridge University Press, 2000.

[2] Pearl, Judea. Causality. Cambridge University Press, 2009.

[3] David Gooding, “Thinking Through Computing,” Warwick Nov 2-3 2007


Be sure to share and comment. And subscribe.

Comment early, comment often, keep it civil:

This site uses Akismet to reduce spam. Learn how your comment data is processed.



Please comment & share with friends how you prefer to share:

Follow The Showbear Family Circus on WordPress.com

Thanks for reading the Showbear Family Circus.
  1. Like this, very noir. Can smell the stale smoke and caustic aroma of burnt coffee. That mewling grunt of a…

  2. Years ago, (Egad, 50 years ago!) I was attending Cal (Berkeley) I happened to be downtown, just coming out of…

Copyright © 2010— 2023 Lancelot Schaubert.
All Rights Reserved.
If we catch you using any of the substance of this site to train any form of artificial intelligence, we will prosecute
to the fullest extent permitted by any law.

Human children and adults always welcome
to learn bountifully and in joy.