Researchers at UC Berkeley have developed an artificial intelligence system capable of generating entirely new languages from scratch, with a level of internal consistency and diversity that outperforms general-purpose large language models.
ConlangCrafter, detailed in a paper published by the Association of Computer Linguists, applies structured linguistic rules to create constructed languages that adhere to their own internal grammar systems. The tool combines phonological constraints (sound patterns), morphosyntactic structures (word and sentence formation), and vocabulary generation while maintaining logical consistency across all components.
Beyond Human-Designed Languages
The system was developed by researchers including Gasper Begus, an associate professor of linguistics at UC Berkeley, alongside collaborators at Carnegie Mellon University and Tel Aviv University. According to IEEE Spectrum AI, ConlangCrafter demonstrates approximately twice the diversity of languages generated by conventional LLMs like Gemini-2.5-Pro, while maintaining nearly 70 percent greater consistency in rule adherence.
Begus emphasized the research value of machine-generated linguistic systems: "Models are able to imagine or come up with things that we might not, and we can learn so much from that." One notable example shows ConlangCrafter creating a color and gesture-based communication system modeled after hypothetical cephalopod speech patterns. While not scientifically accurate to actual octopus behavior, such outputs provide frameworks for studying non-human-centric communication models.
Technical Architecture
The platform allows researchers to specify particular linguistic parameters or let the system autonomously generate rule sets. Key features include:
- Phonological rule application for sound organization
- Morphosyntactic framework generation for word and sentence structure
- Vocabulary construction within defined linguistic parameters
- Randomization mechanisms ensuring output diversity
- Built-in error-checking loops that identify and resolve internal contradictions
Morris Alper, a postdoctoral researcher at Carnegie Mellon who specializes in multimodal machine learning and computational linguistics, explained the dual design goals: "You want languages to be creative so they're all different from each other. You also want them to be consistent, because a language is like a system of rules, and those rules shouldn't contradict each other."
Research Applications
The tool addresses a significant gap in natural language processing research. David Mortensen, an assistant research professor at Carnegie Mellon's Language Technologies Institute, noted that ConlangCrafter could facilitate controlled studies of how linguistic structure affects AI model performance. "There is a substantial body of research suggesting that linguistic structure affects model performance, but hypotheses in this area have been very hard to evaluate," Mortensen said.
By generating diverse, rule-consistent languages on demand, ConlangCrafter enables researchers to systematically test how factors like language typology and lexical density influence machine learning system behavior. This capability could provide empirical grounding for linguistic structure research that has previously relied on limited natural language samples.
The work represents a shift in how researchers conceptualize AI's role in linguistics: not merely analyzing existing language systems, but generating novel ones that expand the sample space available for computational investigation.
