Limits of the Transformer Architecture and a QCD-like Alternative The transformer architecture has no physics below the token scale. You cannot ask "what is the next character" if you trained on subword units — the question is literally undefined.
The Transformer as Renormalization Group Flow The forward pass through a transformer implements a Kadanoff-Wilson renormalization group flow, coarse-graining microscopic token representations into stable semantic attractors.