Limits of the Transformer Architecture and a QCD-like Alternative
The transformer architecture has no physics below the token scale. You cannot ask "what is the next character" if you trained on subword units — the question is literally undefined.
The transformer architecture has no physics below the token scale. You cannot ask "what is the next character" if you trained on subword units — the question is literally undefined.