A new open-source artificial intelligence system can automatically map the informal networks connecting European political elites by analyzing vast news archives, addressing a longstanding challenge in computational social science.

Researchers Kirill Solovev and Jana Lasser have built a modular pipeline that uses large language models and named-entity recognition to extract political relationships from unstructured text across multiple languages. Rather than relying on proprietary AI services, the system operates on open-weight models and uses guided decoding to produce structured knowledge graphs showing who connects to whom in Europe's corridors of power.

Solving Scale and Accuracy Problems

Traditionally, studying how political elites form coalitions or networks required researchers to manually read and code thousands of documents. According to arXiv, the new approach combines several machine learning innovations: span-based named-entity recognition identifies people and organizations, a three-stage linking cascade matches those mentions to Wikidata identifiers for language-independent resolution, and a mixture-of-experts model constrained by domain rules extracts directed relationships marked as positive or adversarial.

The system's accuracy metrics show promise for large-scale deployment. Against a gold-standard dataset of 3,491 relationships, the pipeline achieved 68.2 percent accuracy in strict evaluation and 93.7 percent in lenient scoring, meaning it correctly identifies the core facts in the vast majority of cases.

Real-World Validation in Austria and Poland

Real-World Validation in Austria and Poland
Photo by Edmond Dantès on Pexels.

The researchers tested their pipeline on two significant case studies. In Austria, the system successfully reconstructed a political party's complete history from news text, identifying internal splits, tracking where personnel moved after leaving, and discovering connections to subsequent legal convictions. The approach flagged exact dates when fractures occurred within party structures.

In Polish news archives, the system uncovered overlapping networks of state-enterprise patronage and revealed the balance of conflict between Poland's two dominant parties, Civic Platform and Law and Justice. The knowledge graph captured the structural properties of their political rivalry across years of reporting.

Why This Matters

This work addresses a genuine bottleneck in how political scientists study elite networks. Previous automated approaches limited themselves to simple co-occurrence: noting when two people appeared in the same article without capturing the nature of their relationship. Modern LLMs offered promise but came with problems: most rely on expensive proprietary APIs from companies like OpenAI, struggle when applied across different languages simultaneously, and lack robust tools for matching entity mentions across texts and sources.

By combining open models with careful ontology design and entity linking, Solovev and Lasser created a system that other researchers can run independently on their own hardware. The approach scales to process massive news corpora without recurring API costs.

The framework also produces temporal data, showing how networks evolve. This temporal dimension matters for understanding how political coalitions form and dissolve over months and years, not just at a single snapshot in time.

Implications for Comparative Politics

The pipeline opens possibilities for studying governance questions across many countries simultaneously. Researchers can now ask: which systems tend toward captured rent-seeking networks versus inclusive civic institutions? How do these structures differ between democracies with different institutional designs?

The system's multilingual capability means a single analysis can span countries without requiring separate models or manual translation. That addresses a practical constraint that has kept much computational social science focused on English-language sources.

As generative AI capabilities mature, tools like this demonstrate one productive direction: building structured datasets from unstructured text to enable empirical social science at unprecedented scale.