51 Ways to Spell the Image Giraffe: The Hidden Politics of Token Languages in Generative AI
Ting-Chun Liu, Leon-Etienne Kühr
Generative AI models don't operate on human languages – they speak in tokens. Tokens are computational fragments that deconstruct language into subword units, stored in large dictionaries. These tokens encode not only language but also political ideologies, corporate interests, and cultural biases even before model training begins. Social media handles like realdonaldtrump, brand names like louisvuitton, or even !!!!!!!!!!!!!!!! exist as single tokens, while other words remain fragmented. Through various artistic and adversarial experiments, we demonstrate that tokenization is a political act that determines what can be represented and how images become computable through language.