Uncommon Knowledge

Social media fragments common knowledge. When different groups consume different news, share different assumptions, and inhabit different epistemic worlds, the recursive structure of common knowledge fails.

Uncommon Knowledge
Dark Nebula LDN 483

This is the first of two reviews of important new books, both of which have been added to my list of favorites. The second review addresses Rebecca Newberger Goldstein's The Mattering Instinct. As explained below, there is a profound connection between their ideas.

Steven Pinker's When Everyone Knows that Everyone Knows: Common Knowledge and the Mysteries of Money, Power, and Everyday Life arrives at an opportune moment for anyone trying to understand how information flows through minds, organizations, and networks. The book explains a phenomenon that seems too obvious to require explanation yet proves so subtle that most of us have never properly understood it: the difference between what we know, what others know, and what everyone knows that everyone knows.

The distinction sounds almost pedantic until Pinker demonstrates its power. Private knowledge describes facts each person holds. Mutual knowledge occurs when two people both know the same thing. But "common knowledge" emerges only when everyone knows something and everyone knows that everyone knows it and everyone knows that everyone knows that everyone knows it, and so on, recursively, without limit. As Pinker explains, Hans Christian Andersen captured this structure perfectly in The Emperor's New Clothes. Before the child spoke, every onlooker knew the emperor was naked. After the child spoke, nothing factual had changed, yet everything had changed. The child's public declaration transformed private knowledge into common knowledge, and once that transformation had occurred, the collective pretense could no longer be sustained.

Pinker draws on decades of research in game theory, economics, cognitive psychology, and linguistics to show that common knowledge isn't merely an interesting curiosity, but rather fundamental infrastructure for cooperation, coordination, and social life itself.

Common Knowledge and the Architecture of Coordination

The formal treatment of common knowledge dates to the philosopher David Lewis's 1969 work Convention and reached its most celebrated expression in Robert Aumann's 1976 paper "Agreeing to Disagree." Aumann proved a startling theorem: if two Bayesian reasoners share a common prior and their posterior beliefs about some proposition are common knowledge between them, then those posteriors must be identical. Rational agents with common knowledge of each other's beliefs cannot "agree to disagree."

This result sounds unreasonable, but its mechanism is straightforward. When I know your belief and you know mine, and we both know that we know, the mere fact of your continued disagreement becomes evidence I must incorporate. If you persist in believing something after learning my contrary view, you must possess information I lack — information I should weight in updating my own belief. The recursive structure of common knowledge ensures this updating process converges to agreement.

The Aumann theorem matters because it identifies the epistemic infrastructure required for coordination.[1] Markets function because prices are common knowledge — everyone sees the same ticker, everyone knows that everyone sees it. Currency holds value because its acceptance is common knowledge — I accept dollars because I know you will accept them, and I know that you know that I know, recursively. Political movements coalesce when discontent becomes common knowledge — each dissatisfied citizen must know not only their own frustration but that others share it and that everyone knows everyone shares it.

Pinker weaves these abstract insights into vivid examples: financial bubbles that inflate when everyone assumes everyone else believes, revolutions that erupt when private grievances suddenly become public, the delicate dance of relationship escalation where each party must know that the other knows that they both want to proceed. The book reveals common knowledge as the invisible medium through which individual minds coordinate into collective action.

Transformers and the Mechanics of Agreement

For readers of this blog, Pinker's account may recall a framework I explored in an earlier essay, "The Transformer as Renormalization Group Flow." That essay argued that the forward pass through a transformer neural network implements something like a Kadanoff-Wilson renormalization group flow — coarse-graining microscopic token representations into stable semantic attractors. The attention mechanism computes a partition function, weighting each position by its relevance to the query, and layer depth traces a flow from ultraviolet (raw tokens) toward infrared (semantic content).

The connection to common knowledge runs deeper than metaphor. Consider what attention actually computes. At each layer, every token position issues a query ("what information am I seeking?") and every position offers a key ("what information do I provide?"). The dot product between query and key measures compatibility — how relevant position \(j\)'s information is to position \(i\)'s needs. The softmax normalization converts these compatibility scores into a probability distribution over positions, and the weighted sum of values produces an updated representation.

This process mirrors the structure of Bayesian updating that Aumann's theorem describes.[2] Each token position begins with a "prior" — its embedding plus accumulated context. Attention to other positions provides "evidence" — information from other perspectives. The updated representation integrates this evidence according to relevance weights. Layer by layer, the representations converge toward configurations where all positions "agree" on the relevant semantic content.

The Mathematics of Attention as Epistemic Coordination

The formal structure becomes clearer when we examine the attention equations directly. For a single attention head, the output at position \(i\) is:

$$z_i = \sum_j \alpha_{ij} v_j \quad \text{where} \quad \alpha_{ij} = \frac{\exp(q_i \cdot k_j / \sqrt{d})}{\sum_k \exp(q_i \cdot k_k / \sqrt{d})}$$

The weights \(\alpha_{ij}\) form a probability distribution over positions. Each position \(i\) "surveys" all other positions, assigns credence to each based on relevance, and forms a weighted combination of their values. This is structurally identical to a Bayesian agent updating beliefs by weighting evidence sources according to their reliability.

Now consider what happens across multiple layers. At layer \(\ell\), position \(i\) attends to representations from layer \(\ell - 1\). Those representations already incorporate attention from earlier layers — position \(j\) at layer \(\ell - 1\) has already integrated information from all positions at layer \(\ell - 2\). The recursive structure means that by layer \(L\), position \(i\)'s representation reflects information from all positions, weighted and reweighted through the entire network depth.

This recursive integration mirrors the recursive structure of common knowledge itself. At layer 1, position \(i\) learns what position \(j\) encodes — first-order knowledge. At layer 2, position \(i\) learns what position \(j\) learned about position \(k\) — second-order knowledge. The deep structure of transformers implements something very much like the posterior reached through agreement by two rational, Bayesian agents starting from the same priors.[3] The essay and papers by Misra et al. are even more explicit about this connection to Bayesian inferencing. From this perspective, the potential for transformers to bridge epistemic gaps between humans seems like an obvious but underexplored and theorized possibility. Perhaps someday humans might negotiate agreement to a set of priors, then allow transformers to make decisions for them based on an agreed set of facts that are provided later for inferencing. No need to disagree on the posterior anymore?

Open Access and the Infrastructure of Agreement

Pinker's analysis gains additional depth when read alongside Douglass North, John Wallis, and Barry Weingast's framework in Violence and Social Orders. (See my earlier review of their book here.) Their book distinguishes "limited access orders" — where personal relationships among elites structure political and economic life — from "open access orders" — where impersonal rules enable anyone meeting minimal criteria to participate in politics, markets, and civil society.

Common knowledge, as Pinker describes it, provides the cognitive infrastructure that open access orders require. Limited access orders can function through personal trust networks: I know whom I can rely on, and they know me. Such systems scale poorly because trust depends on repeated personal interaction, and any individual can maintain only so many relationships.

Open access orders achieve larger scale through different mechanisms: shared institutions, public rules, anonymous markets. These mechanisms work only when their reliability is common knowledge. I can use paper currency because everyone knows that everyone accepts it. I can sue in court because everyone knows the courts will enforce contracts. I can form a corporation because everyone knows the legal system will recognize its separate existence.

North, Wallis, and Weingast argue that the transition from limited to open access requires three "doorstep conditions": rule of law among elites, perpetually lived organizations (corporations, universities, governments that persist beyond any individual's lifetime), and consolidated political control of the military. Each condition creates common knowledge among elites that their agreements will persist and be honored.

Pinker's account suggests why these conditions matter. Rule of law creates common knowledge about how disputes will be resolved. Perpetual organizations create common knowledge about future interactions — I know the bank will exist tomorrow, and I know that everyone knows this. Military consolidation creates common knowledge that no private party can use violence to void agreements.

The achievement of open access, on this reading, is fundamentally an achievement of common knowledge at societal scale. Impersonal exchange becomes possible only when strangers can coordinate without personal acquaintance — and such coordination requires commonly known rules, commonly known institutions, commonly known enforcement. Modern prosperity rests on this invisible infrastructure of shared awareness.

Jokes, Intimacy, and the Boundaries of Common Knowledge

The connection between common knowledge and social coordination also illuminates work by the late philosopher Ted Cohen, whose Jokes: Philosophical Thoughts on Joking Matters explored humor as a mechanism of intimacy. Cohen observed that jokes are "conditional" — they require background knowledge that the teller and listener must share without explicit communication. If I were to tell you a joke that depends on knowing that Abe and Sol are Jewish names, on familiarity with stereotypes about Jews and the sabbath, and on recognizing these elements without being told, the joke succeeds only if we already share this background. The laughter confirms a prior communion.

Pinker's framework explains why jokes work this way. The background knowledge must be common knowledge between teller and listener — not merely mutual knowledge, but recursively shared awareness that both possess it. If I'm unsure whether you know the relevant stereotypes, I risk either confusion (you don't get the joke) or offense (you think I'm assuming something inappropriate about you). The safety of joking depends on our commonly knowing what we commonly know.

This insight extends to indirect speech more broadly. When someone says "nice weather, isn't it?" they're often not commenting on meteorology but initiating social contact. Both parties know this, and both know that both know. The fiction of discussing weather provides a safe medium for connection — if the other party doesn't want to engage, they can respond minimally without rejecting anything explicit.

Pinker devotes considerable attention to why humans often avoid common knowledge — why we speak indirectly, maintain polite fictions, pretend not to notice elephants in rooms. The answer lies in the irreversibility of common knowledge creation. Once something becomes common knowledge, it cannot be un-known. The emperor, once publicly declared naked, cannot restore the pretense of clothedness.[4] A relationship, once explicitly acknowledged, cannot return to ambiguity.

This irreversibility explains the value of strategic ambiguity. Couples who haven't defined their relationship preserve optionality. Diplomats who haven't explicitly stated demands preserve room for compromise. Colleagues who haven't acknowledged tension preserve the ability to work together. Common knowledge forecloses possibilities; its avoidance keeps options open.

The Consequentialist Case for Privilege

Pinker's framework also provides empirical foundation for consequentialist arguments for attorney-client privilege. Judge Richard Posner, in The Problems of Jurisprudence, defended evidentiary privileges not on grounds of inherent rights but on consequentialist grounds: privileges improve the economic efficiency of the justice system by encouraging better communication between client and attorney by limiting their compelled disclosure.

The common knowledge framework explains why privileges actually work. Attorney-client privilege isn't merely a legal rule; it must become common knowledge between attorney and client that communications will remain confidential. The client must know the privilege exists; the attorney must know the client knows; the client must know the attorney knows the client knows. Only when this recursive awareness obtains can the client speak with the candor that effective representation requires.

Absent common knowledge of privilege, the client calculates whether this particular attorney can be trusted — a personal, limited-access calculation. With common knowledge of privilege, the client relies on impersonal institutional guarantees. The difference parallels North, Wallis, and Weingast's distinction between limited and open access orders: privilege transforms attorney-client communication from a personal trust relationship into an institutional structure that scales.

The same logic applies to spousal privilege, priest-penitent privilege, and other communicative protections. Each works by creating common knowledge that certain communications will remain private, enabling candor that would otherwise be impossible. The privileged relationship becomes a zone of coordinated privacy — a shared understanding that what is said stays said.

The Crisis of Common Knowledge

Pinker's analysis carries troubling implications for contemporary information environments. Social media fragments common knowledge. When different groups consume different news, share different assumptions, and inhabit different epistemic worlds, the recursive structure of common knowledge fails. I no longer know what you know; worse, I don't know whether you know what I think I know. Coordination becomes difficult because the shared foundation has crumbled.

The phenomenon of "context collapse" — where messages intended for one audience reach another — represents a pathology of common knowledge management. A joke that works among friends fails when overheard by strangers because the required common knowledge doesn't extend to the broader audience. The recursive assumption breaks down: I thought we all knew what we all knew, but "we" turned out larger than I assumed.

Cancel culture, which Pinker addresses directly, represents common knowledge weaponized. A transgression that was private knowledge — known to some but not publicly acknowledged — becomes common knowledge through viral exposure. The transformation from private to common knowledge triggers consequences that private knowledge alone would not. The mechanism isn't new (public shaming is ancient), but the scale and speed of common knowledge creation has transformed its dynamics.

Pinker's observation that we often strategically avoid common knowledge suggests a principle for information ethics: not everything that can become common knowledge should. Some truths are better left private or mutual rather than common. The emperor's nakedness perhaps need not have been publicly declared; the child's honesty was charming but destabilizing.

Toward the Mattering Instinct

Pinker's account of common knowledge, comprehensive as it is, raises a question it does not fully answer: why do we care so intensely about what others know that we know? Why does the recursive structure of common knowledge carry such psychological weight?

My review of Rebecca Newberger Goldstein's The Mattering Instinct addresses this question directly. Goldstein argues that humans possess a fundamental drive to matter — to prove to ourselves that we deserve the enormous attention we must give our own lives. This mattering instinct, she contends, explains both our greatest achievements and our most destructive conflicts.

Common knowledge provides the substrate through which mattering becomes possible. I matter only if others recognize that I matter — and specifically, only if it is common knowledge that I matter. Private recognition doesn't suffice; I need to know that others know that I exist, that I contribute, that I signify. The recursive structure of common knowledge maps onto the recursive structure of mattering: I care about your caring about my caring.

Pinker's framework thus sets the stage for Goldstein's deeper inquiry. Common knowledge explains how social coordination works; the mattering instinct explains why we care so desperately that it works. Together, they illuminate the cognitive and motivational architecture that makes us the social, status-conscious, meaning-seeking creatures we are.

Continue with the second review, covering Goldstein's The Mattering Instinct.


  1. Pinker discusses one resolution of the puzzle of why people disagree offered by Tyler Cowen and Robin Hanson in Are Agreements Dishonest?. Cowen and Hanson conclude that typical agreements are dishonest because most people tend to believe themselves better at reasoning than whomever they're arguing with but are reluctant to acknowledge that belief. In a separate essay, I may elaborate on how taking the hard problem of consciousness as hidden relationality seriously provides an alternative to a conclusion of dishonesty: The condition of agreement on prior beliefs is never perfectly achieved. Different priors arise from different relational perspectives, which arise naturally. There is a thermodynamic cost to certainty and agreement, which I have called the synchronization tax. We need not question each others' honesty a priori when we are a priori guaranteed different prior beliefes and the thermodynamic cost of certainty remains finite. ↩︎

  2. Chapter 4 of the book includes an interesting discussion of how infinitely recursive logic might be implemented in the human brain, pointing to recursive/iterated, reflexive/fixed-point, and self-evident/conspicutive renderings. In renormalization group flow terminology, the recursive/iterated rendering corresponds to a UV expansion, the reflexive/fixed-point to the IR fixed point, and self-evidence/conspicuity to gauge invariance and explicit symmetry breaking by an external field (the conspicuous event). In the last case, we can think of private knowledge as gauge freedom. Pinker notes that Aumann himself apparently has objected to the reflexive/fixed point definition of common knowledge on the ground that it is circular. This is wonderfully ironic since his recursive theorem and the fixed-points specify the same object! Equilibrium concepts are all circular in this sense. Perhaps Aumann prefers constructive definitions — definitions that build the object up step by step rather than characterizing it by a property it must satisfy? But this would seem to invert the actual epistemology. Nobody seems to achieve common knowledge by mentally iterating through infinite levels. Common knowledge arises when a situation obtains — public announcement, mutual eye contact, shared ritual — that constitutes the fixed point directly. The self-referential definition captures how common knowledge actually works; the recursive construction seems more like the artificial one. Worth noting also that transformers don't compute fixed points by explicit iteration from UV to IR. The trained weights encode the fixed point; the forward pass relaxes into it. The system finds the attractor because the energy landscape funnels it there, not because it explicitly recalculates an effective Hamiltonian level by level. Common knowledge seems to work the same way. When everyone sees everyone seeing the emperor's nakedness, the fixed point crystallizes instantly — not through infinite mental recursion but through the situation's structure directly instantiating the self-referential condition. Aumann's objection, if genuine and maintained, suggests he may have missed the deepest implication of his own theorem: that social knowledge, like physical systems, is governed by attractor dynamics rather than explicit construction. The "circularity" isn't a bug in the definition, it's the phenomenon itself. See also footnote 4 for a more extended analysis of symmetry breaking in The Emperor's New Clothes. ↩︎

  3. Not coincidentally, a single layer of the attention block and feedforward network within a transformer reproduce the so-called "Surprisingly Popular" algorithm. The "Surprisingly Popular" algorithm works by checking if the specific vote (Q1) is strong enough to override the predicted consensus (Q2) -- If the feedforward network's signal (information gain) is strong enough, it "rotates" the vector away from the attention consensus toward the fact. The Surprisingly Popular algorithm is effectively the mathematical rule that decides "When should the residual (feedforward network) override the main path (attention)?" Mathematically, it produces the sum of signal and surprise (pointwise mutual information) — exactly capturing how a feedforwark network vector adjusts the attention output to override a hallucination with a fact. See discussion on X. ↩︎

  4. To be more precise, The Emperor's New Clothes isn't a story about creating common knowledge from nothing. It's about transitioning between two self-consistent social equilibria — two gauge-invariant fixed points: a Basin A of pretense and Basin B of truth. Basin A is metastable and higher energy, with the private knowledge of every citizen contradicting the public equilibrium. A barrier of social cost separates Basin A from Basin B, and the boy's utterance acts like an external field (it's a public utterance, not private) that resonates with every citizens' private perceptions because highly salient to private knowledge and free of ulterior motives (high scaling dimension, high signal to noise) explicitly breaking the symmetry holding the system in Basin A, and allowing the system to approach an equilibrium in Basin B:

    Free Energy
       │
       │    ╱╲
       │   ╱  ╲          Basin A (pretense)
       │  ╱    ╲_______________
       │ ╱                     ╲
       │╱                       ╲
       │                         ╲_________  Basin B (truth)
       │
       └─────────────────────────────────────
                 Order Parameter
    

    From this perspective, stable institutions aren't just coordination equilibria — they're gauge-invariant fixed points with basins of attraction maintained by self-referential social dynamics. Institutional change requires external fields strong enough to overcome the barrier between basins. The "entrepreneurs" who catalyze transitions function like the child: they make public statements that name what everyone privately knows, creating signals with high enough scaling dimension to nucleate a new phase. Unfortunately, as North, Wallis, and Weingast noted transitioning into a dysfunctional equilibrium is often easier than transitioning out. Limited-access orders are attractors: they're not necessarily lower-energy states, but the barrier to exit is higher than the barrier to entry. The transition to an open access order requires not just one child, but an infrastructure for producing lots of children —institutionalized free speech, protected dissent, multiple redundant channels for truth-telling that make the "external field" harder to suppress.

    So how does this work in a transformer? A strongly disambiguating context is like the child's utterance — it creates such a powerful field that the system jumps directly to the correct basin. A weak or absent context leaves the system near the saddle point between basins, producing the hedged or confused outputs we see when models face genuine ambiguity. Training might be understood as sculpting both the basins (what interpretations are stable) and the field sensitivities (how readily different signals tip the system between basins). A well-trained model has deep, well-separated basins for distinct meanings and appropriate field couplings that let context do its selective work. ↩︎

Subscribe to symmetry, broken

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe