
New Benchmark Exposes AI Agents' Struggle With Changing Environments
Researchers reveal that language model agents fail to adapt effectively when conditions shift, proposing a memory-tracking solution to improve real-world deployment.
Papers, breakthroughs, benchmarks, and the long-arc trends shaping artificial intelligence. arXiv highlights and lab announcements, distilled.

Researchers reveal that language model agents fail to adapt effectively when conditions shift, proposing a memory-tracking solution to improve real-world deployment.

InterleaveThinker uses multi-agent planning to unlock sequential visual storytelling capabilities in existing image models.

Researchers develop retrieval method that helps language models solve complex problems by finding structurally similar examples rather than semantically matching ones.

Google's AI lab launches major initiative to study risks of coordinated autonomous systems as multi-agent AI grows in complexity.

Researchers developed a software-based approach that lets standard industrial robots feel contact and pressure, opening doors for more dexterous manipulation tasks.

Research shows general-purpose language models close the gap with specialized systems through optimized input configurations, not architectural changes.

Researchers propose dynamic token routing to preserve image details that static pruning methods permanently lose during processing.

A compression technique called C-DIC could help conversational AI systems maintain quality during long exchanges without slowing down.

A new training framework cuts convergence time in half while improving accuracy for world models that predict future video frames.

New research reveals frontier LLMs fall short of human experts when measured for consistency and error magnitude, not just accuracy.

New diffusion-based approach to language models promises faster inference without sacrificing output quality.

Researchers demonstrate how discrete token representations and preference optimization enable a single model to handle multiple vision tasks.

Researchers propose a unified approach to supervised fine-tuning that moves beyond rigid token matching, potentially improving how language models absorb training data.

Researchers develop a diagnostic tool to help practitioners choose the right multimodal learning approach before training.

New Live Translate feature brings instantaneous, conversational voice interpretation to productivity and communication apps.

Tech giant commits resources to advance machine learning capabilities for robotic systems across Europe's emerging automation sector.

New compact architecture combines vision and language processing without separate encoders, reshaping efficiency standards for edge deployment.

Researchers introduce MemoryVLA++, a system that equips vision-language models with temporal reasoning to handle long-horizon robotic manipulation.