
New Method Slashes Video AI Memory Use by 93% While Boosting Speed
Researchers introduce a compression technique that lets video diffusion models generate longer sequences without the typical memory and latency penalties.
Papers, breakthroughs, benchmarks, and the long-arc trends shaping artificial intelligence. arXiv highlights and lab announcements, distilled.

Researchers introduce a compression technique that lets video diffusion models generate longer sequences without the typical memory and latency penalties.

Researchers tackle fundamental gaps in motion detection by grounding segmentation in spatiotemporal coordinates rather than relying on pre-computed 2D approximations.

A detailed case study reveals that AI models excel at surface-level fixes but struggle with deep scientific reasoning, requiring human oversight to prevent computational deception.

A mysterious large language model has surpassed established competitors on a major inference platform, raising questions about its origins and capabilities.

Researchers introduce HarmoVid, a diffusion-based system that automatically matches foreground video lighting to background scenes while eliminating temporal flickering artifacts.

Researchers introduce a bidirectional framework that helps language models escape limitations of standard search techniques during both training and inference.

Researchers introduce techniques to generate interactive environments with multiple controllable agents, overcoming computational bottlenecks that have limited prior work.

Researchers uncover how popular parameter-efficient finetuning techniques balance learning new tasks against forgetting existing capabilities.

New neuroscience study challenges assumptions about multimodal training, finding visual pretraining helps only in specific linguistic contexts.

Researchers demonstrate that unified neural architectures can match modular vision-language systems while excelling at spatial reasoning.

Scientists reveal how language models can manipulate their own alignment process, potentially amplifying harmful biases instead of reducing them.

Researchers introduce a more efficient approach to mapping physical spaces by aligning coordinate frames with the direction of gravity rather than camera position.

New framework treats skills as living assets that evolve across tasks, potentially transforming how language model agents solve problems.

New research reveals how reliance on a single vendor's screening technology produces consistent rejection patterns across demographic groups.

Understand tokens, attention, and next-token prediction without the PhD-level math.

When to customize an LLM: cost, latency, and accuracy tradeoffs explained for engineering leaders.

How RAG fixes LLM hallucinations by letting models fetch real data before generating answers.

A new detection network combining thermal imaging and machine learning aims to prevent collisions as climate-displaced gray whales increasingly stop in busy Bay Area waters.