Limits of the Transformer Architecture and a QCD-like Alternative The transformer architecture has no physics below the token scale. You cannot ask "what is the next character" if you trained on subword units — the question is literally undefined.
The Transformer as Renormalization Group Flow The forward pass through a transformer implements a Kadanoff-Wilson renormalization group flow, coarse-graining microscopic token representations into stable semantic attractors.
GPT-4 Tuning for Writing Style Here's an example of what you'd get if you asked GPT-4 "Please draft a three paragraph argument in favor of open borders": Paragraph 1: Open borders have long been a subject of contentious debate; however, there are compelling reasons to consider the merits of