Reinforcement learning (RL) is the next frontier, Google is surging, and the party scene has gotten completely out of hand. Those were the through lines from this year's NeurIPS in San Diego. NeurIPS, or the "Conference on Neural Information Processing Systems," started in 1987 as a purely academic affair. It has since ballooned alongside the hype around AI into a massive industry event where labs come to recruit and investors come to find the next wave of AI startups.
"Our latest Olaf is a fantastic example of representing an animated character as authentically as possible in the physical world-a challenging task because animated characters most often move in non-physical ways," Kyle Laughlin, senior vice president of Walt Disney Imagineering Research & Development, said in a news release . "For example, to make Olaf's snowball feet move along his body, we paired state-of-the-art deep reinforcement learning with an artistic interface and advances in mechanical design."
The Allen Institute for Artificial Intelligence has launched Olmo 3, an open-source language model family that offers researchers and developers comprehensive access to the entire model development process. Unlike earlier releases that provided only final weights, Olmo 3 includes checkpoints, training datasets, and tools for every stage of development, encompassing pretraining and post-training for reasoning, instruction following, and reinforcement learning.
A long-standing goal of artificial intelligence is to build systems capable of complex reasoning in vast domains, a task epitomized by mathematics with its boundless concepts and demand for rigorous proof. Recent AI systems, often reliant on human data, typically lack the formal verification necessary to guarantee correctness. By contrast, formal languages such as Lean1 offer an interactive environment that grounds reasoning, and reinforcement learning (RL) provides a mechanism for learning in such environments.
Meta's PyTorch team and Hugging Face have unveiled OpenEnv, an open-source initiative designed to standardize how developers create and share environments for AI agents. At its core is the OpenEnv Hub, a collaborative platform for building, testing, and deploying "agentic environments," secure sandboxes that specify the exact tools, APIs, and conditions an agent needs to perform a task safely, consistently, and at scale.
But reasoning models have changed the game, Midha said, referring to the new generation of AI systems designed to "reason"problems step by step, mimicking logic and reflection rather than predicting the next word in a sequence. These models can evaluate their own outputs better, break complex tasks into sub-tasks, and learn from feedback, potentially bringing AI closer to complex, real-world problem-solving.
A few months ago, I asked ChatGPT to recommend books by and about Hermann Joseph Muller, the Nobel Prize-winning geneticist who showed how X-rays can cause mutations. It dutifully gave me three titles. None existed. I asked again. Three more. Still wrong. By the third attempt, I had an epiphany: the system wasn't just mistaken, it was making things up.